aboutsummaryrefslogtreecommitdiff
path: root/sql/core
diff options
context:
space:
mode:
authorsethah <seth.hendrickson16@gmail.com>2016-12-28 07:01:14 -0800
committerYanbo Liang <ybliang8@gmail.com>2016-12-28 07:01:14 -0800
commit6a475ae466a7ce28d507244bf6db91be06ed81ef (patch)
tree851629d08f67b4e9f5c647368b286cf068ae043c /sql/core
parentd7bce3bd31ec193274718042dc017706989d7563 (diff)
downloadspark-6a475ae466a7ce28d507244bf6db91be06ed81ef.tar.gz
spark-6a475ae466a7ce28d507244bf6db91be06ed81ef.tar.bz2
spark-6a475ae466a7ce28d507244bf6db91be06ed81ef.zip
[SPARK-17772][ML][TEST] Add test functions for ML sample weights
## What changes were proposed in this pull request? More and more ML algos are accepting sample weights, and they have been tested rather heterogeneously and with code duplication. This patch adds extensible helper methods to `MLTestingUtils` that can be reused by various algorithms accepting sample weights. Up to now, there seems to be a few tests that have been implemented commonly: * Check that oversampling is the same as giving the instances sample weights proportional to the number of samples * Check that outliers with tiny sample weights do not affect the algorithm's performance This patch adds an additional test: * Check that algorithms are invariant to constant scaling of the sample weights. i.e. uniform sample weights with `w_i = 1.0` is effectively the same as uniform sample weights with `w_i = 10000` or `w_i = 0.0001` The instances of these tests occurred in LinearRegression, NaiveBayes, and LogisticRegression. Those tests have been removed/modified to use the new helper methods. These helper functions will be of use when [SPARK-9478](https://issues.apache.org/jira/browse/SPARK-9478) is implemented. ## How was this patch tested? This patch only involves modifying test suites. ## Other notes Both IsotonicRegression and GeneralizedLinearRegression also extend `HasWeightCol`. I did not modify these test suites because it will make this patch easier to review, and because they did not duplicate the same tests as the three suites that were modified. If we want to change them later, we can create a JIRA for it now, but it's open for debate. Author: sethah <seth.hendrickson16@gmail.com> Closes #15721 from sethah/SPARK-17772.
Diffstat (limited to 'sql/core')
0 files changed, 0 insertions, 0 deletions