diff options
author | unknown <ulanov@ULANOV3.americas.hpqcorp.net> | 2015-11-10 14:25:06 -0800 |
---|---|---|
committer | Xiangrui Meng <meng@databricks.com> | 2015-11-10 14:25:06 -0800 |
commit | dba1a62cf1baa9ae1ee665d592e01dfad78331a2 (patch) | |
tree | a8a89a2d9c39f901454460859b550aa88369318c /project/MimaBuild.scala | |
parent | 18350a57004eb87cafa9504ff73affab4b818e06 (diff) | |
download | spark-dba1a62cf1baa9ae1ee665d592e01dfad78331a2.tar.gz spark-dba1a62cf1baa9ae1ee665d592e01dfad78331a2.tar.bz2 spark-dba1a62cf1baa9ae1ee665d592e01dfad78331a2.zip |
[SPARK-7316][MLLIB] RDD sliding window with step
Implementation of step capability for sliding window function in MLlib's RDD.
Though one can use current sliding window with step 1 and then filter every Nth window, it will take more time and space (N*data.count times more than needed). For example, below are the results for various windows and steps on 10M data points:
Window | Step | Time | Windows produced
------------ | ------------- | ---------- | ----------
128 | 1 | 6.38 | 9999873
128 | 10 | 0.9 | 999988
128 | 100 | 0.41 | 99999
1024 | 1 | 44.67 | 9998977
1024 | 10 | 4.74 | 999898
1024 | 100 | 0.78 | 99990
```
import org.apache.spark.mllib.rdd.RDDFunctions._
val rdd = sc.parallelize(1 to 10000000, 10)
rdd.count
val window = 1024
val step = 1
val t = System.nanoTime(); val windows = rdd.sliding(window, step); println(windows.count); println((System.nanoTime() - t) / 1e9)
```
Author: unknown <ulanov@ULANOV3.americas.hpqcorp.net>
Author: Alexander Ulanov <nashb@yandex.ru>
Author: Xiangrui Meng <meng@databricks.com>
Closes #5855 from avulanov/SPARK-7316-sliding.
Diffstat (limited to 'project/MimaBuild.scala')
0 files changed, 0 insertions, 0 deletions