[SPARK-5089][PYSPARK][MLLIB] Fix vector convert - spark

diff options

author	freeman <the.freeman.lab@gmail.com>	2015-01-05 13:10:59 -0800
committer	Xiangrui Meng <meng@databricks.com>	2015-01-05 13:11:47 -0800
commit	cf55a2b0e14649295b79d0bed365fb87df844361 (patch)
tree	67a1533f07b25df0847107991bdf482855a2b94b /streaming
parent	f979205c1ca87eb7834a7a81381bd32ee0e3095a (diff)
download	spark-cf55a2b0e14649295b79d0bed365fb87df844361.tar.gz spark-cf55a2b0e14649295b79d0bed365fb87df844361.tar.bz2 spark-cf55a2b0e14649295b79d0bed365fb87df844361.zip

[SPARK-5089][PYSPARK][MLLIB] Fix vector convert

This is a small change addressing a potentially significant bug in how PySpark + MLlib handles non-float64 numpy arrays. The automatic conversion to `DenseVector` that occurs when passing RDDs to MLlib algorithms in PySpark should automatically upcast to float64s, but currently this wasn't actually happening. As a result, non-float64 would be silently parsed inappropriately during SerDe, yielding erroneous results when running, for example, KMeans. The PR includes the fix, as well as a new test for the correct conversion behavior. davies Author: freeman <the.freeman.lab@gmail.com> Closes #3902 from freeman-lab/fix-vector-convert and squashes the following commits: 764db47 [freeman] Add a test for proper conversion behavior 704f97e [freeman] Return array after changing type (cherry picked from commit 6c6f32574023b8e43a24f2081ff17e6e446de2f3) Signed-off-by: Xiangrui Meng <meng@databricks.com>

Diffstat (limited to 'streaming')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: