diff options
author | Yuhao Yang <hhbyyh@gmail.com> | 2015-03-10 10:51:44 +0000 |
---|---|---|
committer | Sean Owen <sowen@cloudera.com> | 2015-03-10 10:52:21 +0000 |
commit | 9a0272fbb322042788f14e9cd99e2db86b456225 (patch) | |
tree | 107e0b9d88ae45ba2efaaba2f77941f9e681a982 /ec2/spark_ec2.py | |
parent | 8767565cef01d847f57b7293d8b63b2422009b90 (diff) | |
download | spark-9a0272fbb322042788f14e9cd99e2db86b456225.tar.gz spark-9a0272fbb322042788f14e9cd99e2db86b456225.tar.bz2 spark-9a0272fbb322042788f14e9cd99e2db86b456225.zip |
[SPARK-6177][MLlib]Add note in LDA example to remind possible coalesce
JIRA: https://issues.apache.org/jira/browse/SPARK-6177
Add comment to introduce coalesce to LDA example to avoid the possible massive partitions from `sc.textFile`.
sc.textFile will create RDD with one partition for each file, and the possible massive partitions downgrades LDA performance.
Author: Yuhao Yang <hhbyyh@gmail.com>
Closes #4899 from hhbyyh/adjustPartition and squashes the following commits:
a499630 [Yuhao Yang] update comment
9a2d7b6 [Yuhao Yang] move to comment
f7fd5d4 [Yuhao Yang] Merge remote-tracking branch 'upstream/master' into adjustPartition
26a564a [Yuhao Yang] add coalesce to LDAExample
Diffstat (limited to 'ec2/spark_ec2.py')
0 files changed, 0 insertions, 0 deletions