diff options
author | Reynold Xin <rxin@databricks.com> | 2015-09-23 16:43:21 -0700 |
---|---|---|
committer | Reynold Xin <rxin@databricks.com> | 2015-09-23 16:43:21 -0700 |
commit | 9952217749118ae78fe794ca11e1c4a87a4ae8ba (patch) | |
tree | cf71cc84eb34acdeade45cc8be3642db4faa8d54 /yarn | |
parent | 067afb4e9bb227f159bcbc2aafafce9693303ea9 (diff) | |
download | spark-9952217749118ae78fe794ca11e1c4a87a4ae8ba.tar.gz spark-9952217749118ae78fe794ca11e1c4a87a4ae8ba.tar.bz2 spark-9952217749118ae78fe794ca11e1c4a87a4ae8ba.zip |
[SPARK-10731] [SQL] Delegate to Scala's DataFrame.take implementation in Python DataFrame.
Python DataFrame.head/take now requires scanning all the partitions. This pull request changes them to delegate the actual implementation to Scala DataFrame (by calling DataFrame.take).
This is more of a hack for fixing this issue in 1.5.1. A more proper fix is to change executeCollect and executeTake to return InternalRow rather than Row, and thus eliminate the extra round-trip conversion.
Author: Reynold Xin <rxin@databricks.com>
Closes #8876 from rxin/SPARK-10731.
Diffstat (limited to 'yarn')
0 files changed, 0 insertions, 0 deletions