diff options
author | Yanbo Liang <ybliang8@gmail.com> | 2015-12-07 23:50:57 -0800 |
---|---|---|
committer | Xiangrui Meng <meng@databricks.com> | 2015-12-07 23:50:57 -0800 |
commit | 4a39b5a1bee28cec792d509654f6236390cafdcb (patch) | |
tree | 1637657b13ee5294d74abf8f3f2f4c3f5bf9ba86 /docs | |
parent | 7d05a624510f7299b3dd07f87c203db1ff7caa3e (diff) | |
download | spark-4a39b5a1bee28cec792d509654f6236390cafdcb.tar.gz spark-4a39b5a1bee28cec792d509654f6236390cafdcb.tar.bz2 spark-4a39b5a1bee28cec792d509654f6236390cafdcb.zip |
[SPARK-11958][SPARK-11957][ML][DOC] SQLTransformer user guide and example code
Add ```SQLTransformer``` user guide, example code and make Scala API doc more clear.
Author: Yanbo Liang <ybliang8@gmail.com>
Closes #10006 from yanboliang/spark-11958.
Diffstat (limited to 'docs')
-rw-r--r-- | docs/ml-features.md | 59 |
1 files changed, 59 insertions, 0 deletions
diff --git a/docs/ml-features.md b/docs/ml-features.md index 5105a948fe..f85e0d56d2 100644 --- a/docs/ml-features.md +++ b/docs/ml-features.md @@ -756,6 +756,65 @@ for more details on the API. </div> </div> +## SQLTransformer + +`SQLTransformer` implements the transformations which are defined by SQL statement. +Currently we only support SQL syntax like `"SELECT ... FROM __THIS__ ..."` +where `"__THIS__"` represents the underlying table of the input dataset. +The select clause specifies the fields, constants, and expressions to display in +the output, it can be any select clause that Spark SQL supports. Users can also +use Spark SQL built-in function and UDFs to operate on these selected columns. +For example, `SQLTransformer` supports statements like: + +* `SELECT a, a + b AS a_b FROM __THIS__` +* `SELECT a, SQRT(b) AS b_sqrt FROM __THIS__ where a > 5` +* `SELECT a, b, SUM(c) AS c_sum FROM __THIS__ GROUP BY a, b` + +**Examples** + +Assume that we have the following DataFrame with columns `id`, `v1` and `v2`: + +~~~~ + id | v1 | v2 +----|-----|----- + 0 | 1.0 | 3.0 + 2 | 2.0 | 5.0 +~~~~ + +This is the output of the `SQLTransformer` with statement `"SELECT *, (v1 + v2) AS v3, (v1 * v2) AS v4 FROM __THIS__"`: + +~~~~ + id | v1 | v2 | v3 | v4 +----|-----|-----|-----|----- + 0 | 1.0 | 3.0 | 4.0 | 3.0 + 2 | 2.0 | 5.0 | 7.0 |10.0 +~~~~ + +<div class="codetabs"> +<div data-lang="scala" markdown="1"> + +Refer to the [SQLTransformer Scala docs](api/scala/index.html#org.apache.spark.ml.feature.SQLTransformer) +for more details on the API. + +{% include_example scala/org/apache/spark/examples/ml/SQLTransformerExample.scala %} +</div> + +<div data-lang="java" markdown="1"> + +Refer to the [SQLTransformer Java docs](api/java/org/apache/spark/ml/feature/SQLTransformer.html) +for more details on the API. + +{% include_example java/org/apache/spark/examples/ml/JavaSQLTransformerExample.java %} +</div> + +<div data-lang="python" markdown="1"> + +Refer to the [SQLTransformer Python docs](api/python/pyspark.ml.html#pyspark.ml.feature.SQLTransformer) for more details on the API. + +{% include_example python/ml/sql_transformer.py %} +</div> +</div> + ## VectorAssembler `VectorAssembler` is a transformer that combines a given list of columns into a single vector |