aboutsummaryrefslogtreecommitdiff
path: root/docs/tuning.md
diff options
context:
space:
mode:
authorAndrew Ash <andrew@andrewash.com>2014-04-10 14:59:58 -0700
committerReynold Xin <rxin@apache.org>2014-04-10 14:59:58 -0700
commitf0466625200842f3cc486e9aa1caa417586be533 (patch)
tree6f1b6fa26766b66408531a2325899bea1634065b /docs/tuning.md
parent7b52b66312994d4dbf243eadb6d27eb06350a81f (diff)
downloadspark-f0466625200842f3cc486e9aa1caa417586be533.tar.gz
spark-f0466625200842f3cc486e9aa1caa417586be533.tar.bz2
spark-f0466625200842f3cc486e9aa1caa417586be533.zip
Update tuning.md
http://stackoverflow.com/questions/9699071/what-is-the-javas-internal-represention-for-string-modified-utf-8-utf-16 Author: Andrew Ash <andrew@andrewash.com> Closes #384 from ash211/patch-2 and squashes the following commits: da1b0be [Andrew Ash] Update tuning.md
Diffstat (limited to 'docs/tuning.md')
-rw-r--r--docs/tuning.md5
1 files changed, 3 insertions, 2 deletions
diff --git a/docs/tuning.md b/docs/tuning.md
index 093df3187a..cc069f0e84 100644
--- a/docs/tuning.md
+++ b/docs/tuning.md
@@ -90,9 +90,10 @@ than the "raw" data inside their fields. This is due to several reasons:
* Each distinct Java object has an "object header", which is about 16 bytes and contains information
such as a pointer to its class. For an object with very little data in it (say one `Int` field), this
can be bigger than the data.
-* Java Strings have about 40 bytes of overhead over the raw string data (since they store it in an
+* Java `String`s have about 40 bytes of overhead over the raw string data (since they store it in an
array of `Char`s and keep extra data such as the length), and store each character
- as *two* bytes due to Unicode. Thus a 10-character string can easily consume 60 bytes.
+ as *two* bytes due to `String`'s internal usage of UTF-16 encoding. Thus a 10-character string can
+ easily consume 60 bytes.
* Common collection classes, such as `HashMap` and `LinkedList`, use linked data structures, where
there is a "wrapper" object for each entry (e.g. `Map.Entry`). This object not only has a header,
but also pointers (typically 8 bytes each) to the next object in the list.