diff options
author | Cheng Lian <lian@databricks.com> | 2014-11-03 13:20:33 -0800 |
---|---|---|
committer | Michael Armbrust <michael@databricks.com> | 2014-11-03 13:20:33 -0800 |
commit | c238fb423d1011bd1b1e6201d769b72e52664fc6 (patch) | |
tree | a1d4de68b51efcd5f0d0c29c7732545f45edee96 /docs | |
parent | 24544fbce05665ab4999a1fe5aac434d29cd912c (diff) | |
download | spark-c238fb423d1011bd1b1e6201d769b72e52664fc6.tar.gz spark-c238fb423d1011bd1b1e6201d769b72e52664fc6.tar.bz2 spark-c238fb423d1011bd1b1e6201d769b72e52664fc6.zip |
[SPARK-4202][SQL] Simple DSL support for Scala UDF
This feature is based on an offline discussion with mengxr, hopefully can be useful for the new MLlib pipeline API.
For the following test snippet
```scala
case class KeyValue(key: Int, value: String)
val testData = sc.parallelize(1 to 10).map(i => KeyValue(i, i.toString)).toSchemaRDD
def foo(a: Int, b: String) => a.toString + b
```
the newly introduced DSL enables the following syntax
```scala
import org.apache.spark.sql.catalyst.dsl._
testData.select(Star(None), foo.call('key, 'value) as 'result)
```
which is equivalent to
```scala
testData.registerTempTable("testData")
sqlContext.registerFunction("foo", foo)
sql("SELECT *, foo(key, value) AS result FROM testData")
```
Author: Cheng Lian <lian@databricks.com>
Closes #3067 from liancheng/udf-dsl and squashes the following commits:
f132818 [Cheng Lian] Adds DSL support for Scala UDF
Diffstat (limited to 'docs')
0 files changed, 0 insertions, 0 deletions