diff options
author | Liang-Chi Hsieh <viirya@gmail.com> | 2017-04-05 17:46:44 -0700 |
---|---|---|
committer | Joseph K. Bradley <joseph@databricks.com> | 2017-04-05 17:46:44 -0700 |
commit | 12206058e8780e202c208b92774df3773eff36ae (patch) | |
tree | 363db4aa846ad9e7a57285fd9ba57d5921bb7039 /external | |
parent | 9d68c67235481fa33983afb766916b791ca8212a (diff) | |
download | spark-12206058e8780e202c208b92774df3773eff36ae.tar.gz spark-12206058e8780e202c208b92774df3773eff36ae.tar.bz2 spark-12206058e8780e202c208b92774df3773eff36ae.zip |
[SPARK-20214][ML] Make sure converted csc matrix has sorted indices
## What changes were proposed in this pull request?
`_convert_to_vector` converts a scipy sparse matrix to csc matrix for initializing `SparseVector`. However, it doesn't guarantee the converted csc matrix has sorted indices and so a failure happens when you do something like that:
from scipy.sparse import lil_matrix
lil = lil_matrix((4, 1))
lil[1, 0] = 1
lil[3, 0] = 2
_convert_to_vector(lil.todok())
File "/home/jenkins/workspace/python/pyspark/mllib/linalg/__init__.py", line 78, in _convert_to_vector
return SparseVector(l.shape[0], csc.indices, csc.data)
File "/home/jenkins/workspace/python/pyspark/mllib/linalg/__init__.py", line 556, in __init__
% (self.indices[i], self.indices[i + 1]))
TypeError: Indices 3 and 1 are not strictly increasing
A simple test can confirm that `dok_matrix.tocsc()` won't guarantee sorted indices:
>>> from scipy.sparse import lil_matrix
>>> lil = lil_matrix((4, 1))
>>> lil[1, 0] = 1
>>> lil[3, 0] = 2
>>> dok = lil.todok()
>>> csc = dok.tocsc()
>>> csc.has_sorted_indices
0
>>> csc.indices
array([3, 1], dtype=int32)
I checked the source codes of scipy. The only way to guarantee it is `csc_matrix.tocsr()` and `csr_matrix.tocsc()`.
## How was this patch tested?
Existing tests.
Please review http://spark.apache.org/contributing.html before opening a pull request.
Author: Liang-Chi Hsieh <viirya@gmail.com>
Closes #17532 from viirya/make-sure-sorted-indices.
Diffstat (limited to 'external')
0 files changed, 0 insertions, 0 deletions