diff options
author | DB Tsai <dbtsai@alpinenow.com> | 2014-12-02 11:40:43 +0800 |
---|---|---|
committer | Xiangrui Meng <meng@databricks.com> | 2014-12-02 11:40:43 +0800 |
commit | 64f3175bf976f5a28e691cedc7a4b333709e0c58 (patch) | |
tree | 5e9f414bb51f79f7de184909c82fbc7c90e5d2ae /sbin/spark-config.sh | |
parent | b0a46d899541ec17db090aac6f9ea1b287ee9331 (diff) | |
download | spark-64f3175bf976f5a28e691cedc7a4b333709e0c58.tar.gz spark-64f3175bf976f5a28e691cedc7a4b333709e0c58.tar.bz2 spark-64f3175bf976f5a28e691cedc7a4b333709e0c58.zip |
[SPARK-4611][MLlib] Implement the efficient vector norm
The vector norm in breeze is implemented by `activeIterator` which is known to be very slow.
In this PR, an efficient vector norm is implemented, and with this API, `Normalizer` and
`k-means` have big performance improvement.
Here is the benchmark against mnist8m dataset.
a) `Normalizer`
Before
DenseVector: 68.25secs
SparseVector: 17.01secs
With this PR
DenseVector: 12.71secs
SparseVector: 2.73secs
b) `k-means`
Before
DenseVector: 83.46secs
SparseVector: 61.60secs
With this PR
DenseVector: 70.04secs
SparseVector: 59.05secs
Author: DB Tsai <dbtsai@alpinenow.com>
Closes #3462 from dbtsai/norm and squashes the following commits:
63c7165 [DB Tsai] typo
0c3637f [DB Tsai] add import org.apache.spark.SparkContext._ back
6fa616c [DB Tsai] address feedback
9b7cb56 [DB Tsai] move norm to static method
0b632e6 [DB Tsai] kmeans
dbed124 [DB Tsai] style
c1a877c [DB Tsai] first commit
Diffstat (limited to 'sbin/spark-config.sh')
0 files changed, 0 insertions, 0 deletions