aboutsummaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
authorDB Tsai <dbtsai@alpinenow.com>2014-12-02 11:40:43 +0800
committerXiangrui Meng <meng@databricks.com>2014-12-02 11:41:06 +0800
commit3783e15f0dc36f966b449227668e232707d6696b (patch)
treee71e367b1e7566d2415d11a36293a5443654f655 /docs
parent445fc9550863bb8616acd6675d57077789177c03 (diff)
downloadspark-3783e15f0dc36f966b449227668e232707d6696b.tar.gz
spark-3783e15f0dc36f966b449227668e232707d6696b.tar.bz2
spark-3783e15f0dc36f966b449227668e232707d6696b.zip
[SPARK-4611][MLlib] Implement the efficient vector norm
The vector norm in breeze is implemented by `activeIterator` which is known to be very slow. In this PR, an efficient vector norm is implemented, and with this API, `Normalizer` and `k-means` have big performance improvement. Here is the benchmark against mnist8m dataset. a) `Normalizer` Before DenseVector: 68.25secs SparseVector: 17.01secs With this PR DenseVector: 12.71secs SparseVector: 2.73secs b) `k-means` Before DenseVector: 83.46secs SparseVector: 61.60secs With this PR DenseVector: 70.04secs SparseVector: 59.05secs Author: DB Tsai <dbtsai@alpinenow.com> Closes #3462 from dbtsai/norm and squashes the following commits: 63c7165 [DB Tsai] typo 0c3637f [DB Tsai] add import org.apache.spark.SparkContext._ back 6fa616c [DB Tsai] address feedback 9b7cb56 [DB Tsai] move norm to static method 0b632e6 [DB Tsai] kmeans dbed124 [DB Tsai] style c1a877c [DB Tsai] first commit (cherry picked from commit 64f3175bf976f5a28e691cedc7a4b333709e0c58) Signed-off-by: Xiangrui Meng <meng@databricks.com>
Diffstat (limited to 'docs')
0 files changed, 0 insertions, 0 deletions