diff options
author | Xusen Yin <yinxusen@gmail.com> | 2015-05-11 18:41:22 -0700 |
---|---|---|
committer | Joseph K. Bradley <joseph@databricks.com> | 2015-05-11 18:41:22 -0700 |
commit | 35fb42a0b01d3043b7d5e27256d1b45a08583aab (patch) | |
tree | ec2502ed23ffea8e38c708907d2cee1769fd4525 /streaming | |
parent | 87229c95c6b597f5b84e36d518b9830e3ba63424 (diff) | |
download | spark-35fb42a0b01d3043b7d5e27256d1b45a08583aab.tar.gz spark-35fb42a0b01d3043b7d5e27256d1b45a08583aab.tar.bz2 spark-35fb42a0b01d3043b7d5e27256d1b45a08583aab.zip |
[SPARK-5893] [ML] Add bucketizer
JIRA issue [here](https://issues.apache.org/jira/browse/SPARK-5893).
One thing to make clear, the `buckets` parameter, which is an array of `Double`, performs as split points. Say,
```scala
buckets = Array(-0.5, 0.0, 0.5)
```
splits the real number into 4 ranges, (-inf, -0.5], (-0.5, 0.0], (0.0, 0.5], (0.5, +inf), which is encoded as 0, 1, 2, 3.
Author: Xusen Yin <yinxusen@gmail.com>
Author: Joseph K. Bradley <joseph@databricks.com>
Closes #5980 from yinxusen/SPARK-5893 and squashes the following commits:
dc8c843 [Xusen Yin] Merge pull request #4 from jkbradley/yinxusen-SPARK-5893
1ca973a [Joseph K. Bradley] one more bucketizer test
34f124a [Joseph K. Bradley] Removed lowerInclusive, upperInclusive params from Bucketizer, and used splits instead.
eacfcfa [Xusen Yin] change ML attribute from splits into buckets
c3cc770 [Xusen Yin] add more unit test for binary search
3a16cc2 [Xusen Yin] refine comments and names
ac77859 [Xusen Yin] fix style error
fb30d79 [Xusen Yin] fix and test binary search
2466322 [Xusen Yin] refactor Bucketizer
11fb00a [Xusen Yin] change it into an Estimator
998bc87 [Xusen Yin] check buckets
4024cf1 [Xusen Yin] add test suite
5fe190e [Xusen Yin] add bucketizer
Diffstat (limited to 'streaming')
0 files changed, 0 insertions, 0 deletions