diff options
author | Andrew Ash <andrew@andrewash.com> | 2014-04-10 14:59:58 -0700 |
---|---|---|
committer | Patrick Wendell <pwendell@gmail.com> | 2014-04-10 15:11:03 -0700 |
commit | 4c9906d85b60f34e5bc0e23409a848746fe5cf96 (patch) | |
tree | 6f1b6fa26766b66408531a2325899bea1634065b | |
parent | 1e2cdbca50c49af6457a3dd7de7da9dfd75461c7 (diff) | |
download | spark-4c9906d85b60f34e5bc0e23409a848746fe5cf96.tar.gz spark-4c9906d85b60f34e5bc0e23409a848746fe5cf96.tar.bz2 spark-4c9906d85b60f34e5bc0e23409a848746fe5cf96.zip |
Update tuning.md
http://stackoverflow.com/questions/9699071/what-is-the-javas-internal-represention-for-string-modified-utf-8-utf-16
Author: Andrew Ash <andrew@andrewash.com>
Closes #384 from ash211/patch-2 and squashes the following commits:
da1b0be [Andrew Ash] Update tuning.md
-rw-r--r-- | docs/tuning.md | 5 |
1 files changed, 3 insertions, 2 deletions
diff --git a/docs/tuning.md b/docs/tuning.md index 093df3187a..cc069f0e84 100644 --- a/docs/tuning.md +++ b/docs/tuning.md @@ -90,9 +90,10 @@ than the "raw" data inside their fields. This is due to several reasons: * Each distinct Java object has an "object header", which is about 16 bytes and contains information such as a pointer to its class. For an object with very little data in it (say one `Int` field), this can be bigger than the data. -* Java Strings have about 40 bytes of overhead over the raw string data (since they store it in an +* Java `String`s have about 40 bytes of overhead over the raw string data (since they store it in an array of `Char`s and keep extra data such as the length), and store each character - as *two* bytes due to Unicode. Thus a 10-character string can easily consume 60 bytes. + as *two* bytes due to `String`'s internal usage of UTF-16 encoding. Thus a 10-character string can + easily consume 60 bytes. * Common collection classes, such as `HashMap` and `LinkedList`, use linked data structures, where there is a "wrapper" object for each entry (e.g. `Map.Entry`). This object not only has a header, but also pointers (typically 8 bytes each) to the next object in the list. |