aboutsummaryrefslogtreecommitdiff
path: root/python/test_support
diff options
context:
space:
mode:
authorYanbo Liang <ybliang8@gmail.com>2016-09-26 09:45:33 +0100
committerSean Owen <sowen@cloudera.com>2016-09-26 09:45:33 +0100
commitac65139be96dbf87402b9a85729a93afd3c6ff17 (patch)
tree6b9580267acc710567fe5509fc66ba10fa01ec29 /python/test_support
parent59d87d24079bc633e63ce032f0a5ddd18a3b02cb (diff)
downloadspark-ac65139be96dbf87402b9a85729a93afd3c6ff17.tar.gz
spark-ac65139be96dbf87402b9a85729a93afd3c6ff17.tar.bz2
spark-ac65139be96dbf87402b9a85729a93afd3c6ff17.zip
[SPARK-17017][FOLLOW-UP][ML] Refactor of ChiSqSelector and add ML Python API.
## What changes were proposed in this pull request? #14597 modified ```ChiSqSelector``` to support ```fpr``` type selector, however, it left some issue need to be addressed: * We should allow users to set selector type explicitly rather than switching them by using different setting function, since the setting order will involves some unexpected issue. For example, if users both set ```numTopFeatures``` and ```percentile```, it will train ```kbest``` or ```percentile``` model based on the order of setting (the latter setting one will be trained). This make users confused, and we should allow users to set selector type explicitly. We handle similar issues at other place of ML code base such as ```GeneralizedLinearRegression``` and ```LogisticRegression```. * Meanwhile, if there are more than one parameter except ```alpha``` can be set for ```fpr``` model, we can not handle it elegantly in the existing framework. And similar issues for ```kbest``` and ```percentile``` model. Setting selector type explicitly can solve this issue also. * If setting selector type explicitly by users is allowed, we should handle param interaction such as if users set ```selectorType = percentile``` and ```alpha = 0.1```, we should notify users the parameter ```alpha``` will take no effect. We should handle complex parameter interaction checks at ```transformSchema```. (FYI #11620) * We should use lower case of the selector type names to follow MLlib convention. * Add ML Python API. ## How was this patch tested? Unit test. Author: Yanbo Liang <ybliang8@gmail.com> Closes #15214 from yanboliang/spark-17017.
Diffstat (limited to 'python/test_support')
0 files changed, 0 insertions, 0 deletions