[SPARK-13250] [SQL] Update PhysicallRDD to convert to UnsafeRow if using the vectorized scanner.

Some parts of the engine rely on UnsafeRow which the vectorized parquet scanner does not want to produce. This add a conversion in Physical RDD. In the case where codegen is used (and the scan is the start of the pipeline), there is no requirement to use UnsafeRow. This patch adds update PhysicallRDD to support codegen, which eliminates the need for the UnsafeRow conversion in all cases. The result of these changes for TPCDS-Q19 at the 10gb sf reduces the query time from 9.5 seconds to 6.5 seconds. Author: Nong Li <nong@databricks.com> Closes #11141 from nongli/spark-13250.
author: Nong Li <nong@databricks.com> 2016-02-24 17:16:45 -0800
committer: Davies Liu <davies.liu@gmail.com> 2016-02-24 17:16:45 -0800
commit: 5a7af9e7ac85e04aa4a420bc2887207bfa18f792 (patch)
tree: 26537c528dcec8d989eda1f7640670fdac7feb86 /python
parent: cbb0b65ad53f2642fab8aad3ea115375c53b6eed (diff)
download: spark-5a7af9e7ac85e04aa4a420bc2887207bfa18f792.tar.gz
spark-5a7af9e7ac85e04aa4a420bc2887207bfa18f792.tar.bz2
spark-5a7af9e7ac85e04aa4a420bc2887207bfa18f792.zip
1 files changed, 2 insertions, 1 deletions
diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py
index bf43452e08..7275e69353 100644
--- a/python/pyspark/sql/dataframe.py
+++ b/python/pyspark/sql/dataframe.py
@@ -173,7 +173,8 @@ class DataFrame(object):
 
         >>> df.explain()
         == Physical Plan ==
-        Scan ExistingRDD[age#0,name#1]
+        WholeStageCodegen
+        :  +- Scan ExistingRDD[age#0,name#1]
 
         >>> df.explain(True)
         == Parsed Logical Plan ==
author	Nong Li <nong@databricks.com>	2016-02-24 17:16:45 -0800
committer	Davies Liu <davies.liu@gmail.com>	2016-02-24 17:16:45 -0800
commit	5a7af9e7ac85e04aa4a420bc2887207bfa18f792 (patch)
tree	26537c528dcec8d989eda1f7640670fdac7feb86 /python
parent	cbb0b65ad53f2642fab8aad3ea115375c53b6eed (diff)
download	spark-5a7af9e7ac85e04aa4a420bc2887207bfa18f792.tar.gz spark-5a7af9e7ac85e04aa4a420bc2887207bfa18f792.tar.bz2 spark-5a7af9e7ac85e04aa4a420bc2887207bfa18f792.zip