aboutsummaryrefslogtreecommitdiff
path: root/sql/core/src
diff options
context:
space:
mode:
authorNick Pentreath <nickp@za.ibm.com>2017-02-28 16:17:35 +0200
committerNick Pentreath <nickp@za.ibm.com>2017-02-28 16:17:35 +0200
commitb405466513bcc02cadf1477b6b682ace95d81658 (patch)
tree5f1d0b2e6ebe9b8c463010bca8bea4074ad5ef86 /sql/core/src
parent9b8eca65dcf68129470ead39362ce870ffb0bb1d (diff)
downloadspark-b405466513bcc02cadf1477b6b682ace95d81658.tar.gz
spark-b405466513bcc02cadf1477b6b682ace95d81658.tar.bz2
spark-b405466513bcc02cadf1477b6b682ace95d81658.zip
[SPARK-14489][ML][PYSPARK] ALS unknown user/item prediction strategy
This PR adds a param to `ALS`/`ALSModel` to set the strategy used when encountering unknown users or items at prediction time in `transform`. This can occur in 2 scenarios: (a) production scoring, and (b) cross-validation & evaluation. The current behavior returns `NaN` if a user/item is unknown. In scenario (b), this can easily occur when using `CrossValidator` or `TrainValidationSplit` since some users/items may only occur in the test set and not in the training set. In this case, the evaluator returns `NaN` for all metrics, making model selection impossible. The new param, `coldStartStrategy`, defaults to `nan` (the current behavior). The other option supported initially is `drop`, which drops all rows with `NaN` predictions. This flag allows users to use `ALS` in cross-validation settings. It is made an `expertParam`. The param is made a string so that the set of strategies can be extended in future (some options are discussed in [SPARK-14489](https://issues.apache.org/jira/browse/SPARK-14489)). ## How was this patch tested? New unit tests, and manual "before and after" tests for Scala & Python using MovieLens `ml-latest-small` as example data. Here, using `CrossValidator` or `TrainValidationSplit` with the default param setting results in metrics that are all `NaN`, while setting `coldStartStrategy` to `drop` results in valid metrics. Author: Nick Pentreath <nickp@za.ibm.com> Closes #12896 from MLnick/SPARK-14489-als-nan.
Diffstat (limited to 'sql/core/src')
0 files changed, 0 insertions, 0 deletions