diff options
author | Xiangrui Meng <meng@databricks.com> | 2014-11-03 22:29:48 -0800 |
---|---|---|
committer | Xiangrui Meng <meng@databricks.com> | 2014-11-03 22:31:43 -0800 |
commit | 8395e8fbdf23bef286ec68a4bbadcc448b504c2c (patch) | |
tree | 8bea8ca2bba38d861a8e428b9e295bb8782d8d85 /sql | |
parent | 42d02db86cd973cf31ceeede0c5a723238bbe746 (diff) | |
download | spark-8395e8fbdf23bef286ec68a4bbadcc448b504c2c.tar.gz spark-8395e8fbdf23bef286ec68a4bbadcc448b504c2c.tar.bz2 spark-8395e8fbdf23bef286ec68a4bbadcc448b504c2c.zip |
[SPARK-3573][MLLIB] Make MLlib's Vector compatible with SQL's SchemaRDD
Register MLlib's Vector as a SQL user-defined type (UDT) in both Scala and Python. With this PR, we can easily map a RDD[LabeledPoint] to a SchemaRDD, and then select columns or save to a Parquet file. Examples in Scala/Python are attached. The Scala code was copied from jkbradley.
~~This PR contains the changes from #3068 . I will rebase after #3068 is merged.~~
marmbrus jkbradley
Author: Xiangrui Meng <meng@databricks.com>
Closes #3070 from mengxr/SPARK-3573 and squashes the following commits:
3a0b6e5 [Xiangrui Meng] organize imports
236f0a0 [Xiangrui Meng] register vector as UDT and provide dataset examples
(cherry picked from commit 1a9c6cddadebdc53d083ac3e0da276ce979b5d1f)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
Diffstat (limited to 'sql')
0 files changed, 0 insertions, 0 deletions