diff options
author | freeman <the.freeman.lab@gmail.com> | 2015-01-05 13:10:59 -0800 |
---|---|---|
committer | Xiangrui Meng <meng@databricks.com> | 2015-01-05 13:10:59 -0800 |
commit | 6c6f32574023b8e43a24f2081ff17e6e446de2f3 (patch) | |
tree | 01940cc05e61712eb4e3e383f6a4ae12c8209c28 /network | |
parent | 1c0e7ce056c79e1db96f85b8c56a479b8b043970 (diff) | |
download | spark-6c6f32574023b8e43a24f2081ff17e6e446de2f3.tar.gz spark-6c6f32574023b8e43a24f2081ff17e6e446de2f3.tar.bz2 spark-6c6f32574023b8e43a24f2081ff17e6e446de2f3.zip |
[SPARK-5089][PYSPARK][MLLIB] Fix vector convert
This is a small change addressing a potentially significant bug in how PySpark + MLlib handles non-float64 numpy arrays. The automatic conversion to `DenseVector` that occurs when passing RDDs to MLlib algorithms in PySpark should automatically upcast to float64s, but currently this wasn't actually happening. As a result, non-float64 would be silently parsed inappropriately during SerDe, yielding erroneous results when running, for example, KMeans.
The PR includes the fix, as well as a new test for the correct conversion behavior.
davies
Author: freeman <the.freeman.lab@gmail.com>
Closes #3902 from freeman-lab/fix-vector-convert and squashes the following commits:
764db47 [freeman] Add a test for proper conversion behavior
704f97e [freeman] Return array after changing type
Diffstat (limited to 'network')
0 files changed, 0 insertions, 0 deletions