diff options
author | Xiangrui Meng <meng@databricks.com> | 2014-11-03 22:29:48 -0800 |
---|---|---|
committer | Xiangrui Meng <meng@databricks.com> | 2014-11-03 22:29:48 -0800 |
commit | 1a9c6cddadebdc53d083ac3e0da276ce979b5d1f (patch) | |
tree | b485818ba52a9287ae7124e57ef55f1d974f3a1f /dev/run-tests | |
parent | 04450d11548cfb25d4fb77d4a33e3a7cd4254183 (diff) | |
download | spark-1a9c6cddadebdc53d083ac3e0da276ce979b5d1f.tar.gz spark-1a9c6cddadebdc53d083ac3e0da276ce979b5d1f.tar.bz2 spark-1a9c6cddadebdc53d083ac3e0da276ce979b5d1f.zip |
[SPARK-3573][MLLIB] Make MLlib's Vector compatible with SQL's SchemaRDD
Register MLlib's Vector as a SQL user-defined type (UDT) in both Scala and Python. With this PR, we can easily map a RDD[LabeledPoint] to a SchemaRDD, and then select columns or save to a Parquet file. Examples in Scala/Python are attached. The Scala code was copied from jkbradley.
~~This PR contains the changes from #3068 . I will rebase after #3068 is merged.~~
marmbrus jkbradley
Author: Xiangrui Meng <meng@databricks.com>
Closes #3070 from mengxr/SPARK-3573 and squashes the following commits:
3a0b6e5 [Xiangrui Meng] organize imports
236f0a0 [Xiangrui Meng] register vector as UDT and provide dataset examples
Diffstat (limited to 'dev/run-tests')
-rwxr-xr-x | dev/run-tests | 2 |
1 files changed, 1 insertions, 1 deletions
diff --git a/dev/run-tests b/dev/run-tests index 0e9eefa76a..de607e4344 100755 --- a/dev/run-tests +++ b/dev/run-tests @@ -180,7 +180,7 @@ CURRENT_BLOCK=$BLOCK_SPARK_UNIT_TESTS if [ -n "$_SQL_TESTS_ONLY" ]; then # This must be an array of individual arguments. Otherwise, having one long string #+ will be interpreted as a single test, which doesn't work. - SBT_MAVEN_TEST_ARGS=("catalyst/test" "sql/test" "hive/test") + SBT_MAVEN_TEST_ARGS=("catalyst/test" "sql/test" "hive/test" "mllib/test") else SBT_MAVEN_TEST_ARGS=("test") fi |