aboutsummaryrefslogtreecommitdiff
path: root/README.md
diff options
context:
space:
mode:
authorAaron Staple <aaron.staple@gmail.com>2014-09-16 11:45:35 -0700
committerMichael Armbrust <michael@databricks.com>2014-09-16 11:45:35 -0700
commit8e7ae477ba40a064d27cf149aa211ff6108fe239 (patch)
tree3f30546913a7e1ca882e65fcbac721ed0ad36258 /README.md
parent30f288ae34a67307aa45b7aecbd0d02a0a14fe69 (diff)
downloadspark-8e7ae477ba40a064d27cf149aa211ff6108fe239.tar.gz
spark-8e7ae477ba40a064d27cf149aa211ff6108fe239.tar.bz2
spark-8e7ae477ba40a064d27cf149aa211ff6108fe239.zip
[SPARK-2314][SQL] Override collect and take in python library, and count in java library, with optimized versions.
SchemaRDD overrides RDD functions, including collect, count, and take, with optimized versions making use of the query optimizer. The java and python interface classes wrapping SchemaRDD need to ensure the optimized versions are called as well. This patch overrides relevant calls in the python and java interfaces with optimized versions. Adds a new Row serialization pathway between python and java, based on JList[Array[Byte]] versus the existing RDD[Array[Byte]]. I wasn’t overjoyed about doing this, but I noticed that some QueryPlans implement optimizations in executeCollect(), which outputs an Array[Row] rather than the typical RDD[Row] that can be shipped to python using the existing serialization code. To me it made sense to ship the Array[Row] over to python directly instead of converting it back to an RDD[Row] just for the purpose of sending the Rows to python using the existing serialization code. Author: Aaron Staple <aaron.staple@gmail.com> Closes #1592 from staple/SPARK-2314 and squashes the following commits: 89ff550 [Aaron Staple] Merge with master. 6bb7b6c [Aaron Staple] Fix typo. b56d0ac [Aaron Staple] [SPARK-2314][SQL] Override count in JavaSchemaRDD, forwarding to SchemaRDD's count. 0fc9d40 [Aaron Staple] Fix comment typos. f03cdfa [Aaron Staple] [SPARK-2314][SQL] Override collect and take in sql.py, forwarding to SchemaRDD's collect.
Diffstat (limited to 'README.md')
0 files changed, 0 insertions, 0 deletions