[SPARK-3573][MLLIB] Make MLlib's Vector compatible with SQL's SchemaRDD - spark

diff options

author	Xiangrui Meng <meng@databricks.com>	2014-11-03 22:29:48 -0800
committer	Xiangrui Meng <meng@databricks.com>	2014-11-03 22:29:48 -0800
commit	1a9c6cddadebdc53d083ac3e0da276ce979b5d1f (patch)
tree	b485818ba52a9287ae7124e57ef55f1d974f3a1f /streaming
parent	04450d11548cfb25d4fb77d4a33e3a7cd4254183 (diff)
download	spark-1a9c6cddadebdc53d083ac3e0da276ce979b5d1f.tar.gz spark-1a9c6cddadebdc53d083ac3e0da276ce979b5d1f.tar.bz2 spark-1a9c6cddadebdc53d083ac3e0da276ce979b5d1f.zip

[SPARK-3573][MLLIB] Make MLlib's Vector compatible with SQL's SchemaRDD

Register MLlib's Vector as a SQL user-defined type (UDT) in both Scala and Python. With this PR, we can easily map a RDD[LabeledPoint] to a SchemaRDD, and then select columns or save to a Parquet file. Examples in Scala/Python are attached. The Scala code was copied from jkbradley. ~~This PR contains the changes from #3068 . I will rebase after #3068 is merged.~~ marmbrus jkbradley Author: Xiangrui Meng <meng@databricks.com> Closes #3070 from mengxr/SPARK-3573 and squashes the following commits: 3a0b6e5 [Xiangrui Meng] organize imports 236f0a0 [Xiangrui Meng] register vector as UDT and provide dataset examples

Diffstat (limited to 'streaming')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: