aboutsummaryrefslogtreecommitdiff
path: root/python/pyspark/ml/pipeline.py
diff options
context:
space:
mode:
authorgatorsmile <gatorsmile@gmail.com>2016-03-16 13:11:11 -0700
committerYin Huai <yhuai@databricks.com>2016-03-16 13:11:11 -0700
commitc4bd57602c0b14188d364bb475631bf473d25082 (patch)
treed5c081e53719b8305f1fcb0061b2454462fb3d25 /python/pyspark/ml/pipeline.py
parent1d1de28a3c3c3a4bc37dc7565b9178a712df493a (diff)
downloadspark-c4bd57602c0b14188d364bb475631bf473d25082.tar.gz
spark-c4bd57602c0b14188d364bb475631bf473d25082.tar.bz2
spark-c4bd57602c0b14188d364bb475631bf473d25082.zip
[SPARK-12721][SQL] SQL Generation for Script Transformation
#### What changes were proposed in this pull request? This PR is to convert to SQL from analyzed logical plans containing operator `ScriptTransformation`. For example, below is the SQL containing `Transform` ``` SELECT TRANSFORM (a, b, c, d) USING 'cat' FROM parquet_t2 ``` Its logical plan is like ``` ScriptTransformation [a#210L,b#211L,c#212L,d#213L], cat, [key#208,value#209], HiveScriptIOSchema(List(),List(),Some(org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe),Some(org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe),List((field.delim, )),List((field.delim, )),Some(org.apache.hadoop.hive.ql.exec.TextRecordReader),Some(org.apache.hadoop.hive.ql.exec.TextRecordWriter),true) +- SubqueryAlias parquet_t2 +- Relation[a#210L,b#211L,c#212L,d#213L] ParquetRelation ``` The generated SQL will be like ``` SELECT TRANSFORM (`parquet_t2`.`a`, `parquet_t2`.`b`, `parquet_t2`.`c`, `parquet_t2`.`d`) USING 'cat' AS (`key` string, `value` string) FROM `default`.`parquet_t2` ``` #### How was this patch tested? Seven test cases are added to `LogicalPlanToSQLSuite`. Author: gatorsmile <gatorsmile@gmail.com> Author: xiaoli <lixiao1983@gmail.com> Author: Xiao Li <xiaoli@Xiaos-MacBook-Pro.local> Closes #11503 from gatorsmile/transformToSQL.
Diffstat (limited to 'python/pyspark/ml/pipeline.py')
0 files changed, 0 insertions, 0 deletions