aboutsummaryrefslogtreecommitdiff
path: root/mllib/pom.xml
diff options
context:
space:
mode:
authorXiangrui Meng <meng@databricks.com>2014-11-03 22:29:48 -0800
committerXiangrui Meng <meng@databricks.com>2014-11-03 22:31:43 -0800
commit8395e8fbdf23bef286ec68a4bbadcc448b504c2c (patch)
tree8bea8ca2bba38d861a8e428b9e295bb8782d8d85 /mllib/pom.xml
parent42d02db86cd973cf31ceeede0c5a723238bbe746 (diff)
downloadspark-8395e8fbdf23bef286ec68a4bbadcc448b504c2c.tar.gz
spark-8395e8fbdf23bef286ec68a4bbadcc448b504c2c.tar.bz2
spark-8395e8fbdf23bef286ec68a4bbadcc448b504c2c.zip
[SPARK-3573][MLLIB] Make MLlib's Vector compatible with SQL's SchemaRDD
Register MLlib's Vector as a SQL user-defined type (UDT) in both Scala and Python. With this PR, we can easily map a RDD[LabeledPoint] to a SchemaRDD, and then select columns or save to a Parquet file. Examples in Scala/Python are attached. The Scala code was copied from jkbradley. ~~This PR contains the changes from #3068 . I will rebase after #3068 is merged.~~ marmbrus jkbradley Author: Xiangrui Meng <meng@databricks.com> Closes #3070 from mengxr/SPARK-3573 and squashes the following commits: 3a0b6e5 [Xiangrui Meng] organize imports 236f0a0 [Xiangrui Meng] register vector as UDT and provide dataset examples (cherry picked from commit 1a9c6cddadebdc53d083ac3e0da276ce979b5d1f) Signed-off-by: Xiangrui Meng <meng@databricks.com>
Diffstat (limited to 'mllib/pom.xml')
-rw-r--r--mllib/pom.xml5
1 files changed, 5 insertions, 0 deletions
diff --git a/mllib/pom.xml b/mllib/pom.xml
index fb7239e779..87a7ddaba9 100644
--- a/mllib/pom.xml
+++ b/mllib/pom.xml
@@ -46,6 +46,11 @@
<version>${project.version}</version>
</dependency>
<dependency>
+ <groupId>org.apache.spark</groupId>
+ <artifactId>spark-sql_${scala.binary.version}</artifactId>
+ <version>${project.version}</version>
+ </dependency>
+ <dependency>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-server</artifactId>
</dependency>