diff options
author | Davies Liu <davies.liu@gmail.com> | 2014-09-11 11:50:36 -0700 |
---|---|---|
committer | Josh Rosen <joshrosen@apache.org> | 2014-09-11 11:50:36 -0700 |
commit | 1ef656ea85b4b93c7b0f3cf8042b63a0de0901cb (patch) | |
tree | f0480c59cad1ab80cf3e03edd6d2d423c6e037b3 /bin | |
parent | ed1980ffa9ccb87d76694ba910ef22df034bca49 (diff) | |
download | spark-1ef656ea85b4b93c7b0f3cf8042b63a0de0901cb.tar.gz spark-1ef656ea85b4b93c7b0f3cf8042b63a0de0901cb.tar.bz2 spark-1ef656ea85b4b93c7b0f3cf8042b63a0de0901cb.zip |
[SPARK-3047] [PySpark] add an option to use str in textFileRDD
str is much efficient than unicode (both CPU and memory), it'e better to use str in textFileRDD. In order to keep compatibility, use unicode by default. (Maybe change it in the future).
use_unicode=True:
daviesliudm:~/work/spark$ time python wc.py
(u'./universe/spark/sql/core/target/java/org/apache/spark/sql/execution/ExplainCommand$.java', 7776)
real 2m8.298s
user 0m0.185s
sys 0m0.064s
use_unicode=False
daviesliudm:~/work/spark$ time python wc.py
('./universe/spark/sql/core/target/java/org/apache/spark/sql/execution/ExplainCommand$.java', 7776)
real 1m26.402s
user 0m0.182s
sys 0m0.062s
We can see that it got 32% improvement!
Author: Davies Liu <davies.liu@gmail.com>
Closes #1951 from davies/unicode and squashes the following commits:
8352d57 [Davies Liu] update version number
a286f2f [Davies Liu] rollback loads()
85246e5 [Davies Liu] add docs for use_unicode
a0295e1 [Davies Liu] add an option to use str in textFile()
Diffstat (limited to 'bin')
0 files changed, 0 insertions, 0 deletions