[SPARK-5893] [ML] Add bucketizer - spark

diff options

author	Xusen Yin <yinxusen@gmail.com>	2015-05-11 18:41:22 -0700
committer	Joseph K. Bradley <joseph@databricks.com>	2015-05-11 18:41:22 -0700
commit	35fb42a0b01d3043b7d5e27256d1b45a08583aab (patch)
tree	ec2502ed23ffea8e38c708907d2cee1769fd4525 /streaming
parent	87229c95c6b597f5b84e36d518b9830e3ba63424 (diff)
download	spark-35fb42a0b01d3043b7d5e27256d1b45a08583aab.tar.gz spark-35fb42a0b01d3043b7d5e27256d1b45a08583aab.tar.bz2 spark-35fb42a0b01d3043b7d5e27256d1b45a08583aab.zip

[SPARK-5893] [ML] Add bucketizer

JIRA issue [here](https://issues.apache.org/jira/browse/SPARK-5893). One thing to make clear, the `buckets` parameter, which is an array of `Double`, performs as split points. Say, ```scala buckets = Array(-0.5, 0.0, 0.5) ``` splits the real number into 4 ranges, (-inf, -0.5], (-0.5, 0.0], (0.0, 0.5], (0.5, +inf), which is encoded as 0, 1, 2, 3. Author: Xusen Yin <yinxusen@gmail.com> Author: Joseph K. Bradley <joseph@databricks.com> Closes #5980 from yinxusen/SPARK-5893 and squashes the following commits: dc8c843 [Xusen Yin] Merge pull request #4 from jkbradley/yinxusen-SPARK-5893 1ca973a [Joseph K. Bradley] one more bucketizer test 34f124a [Joseph K. Bradley] Removed lowerInclusive, upperInclusive params from Bucketizer, and used splits instead. eacfcfa [Xusen Yin] change ML attribute from splits into buckets c3cc770 [Xusen Yin] add more unit test for binary search 3a16cc2 [Xusen Yin] refine comments and names ac77859 [Xusen Yin] fix style error fb30d79 [Xusen Yin] fix and test binary search 2466322 [Xusen Yin] refactor Bucketizer 11fb00a [Xusen Yin] change it into an Estimator 998bc87 [Xusen Yin] check buckets 4024cf1 [Xusen Yin] add test suite 5fe190e [Xusen Yin] add bucketizer

Diffstat (limited to 'streaming')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: