diff options
author | Xiangrui Meng <meng@databricks.com> | 2016-04-11 09:28:28 -0700 |
---|---|---|
committer | Xiangrui Meng <meng@databricks.com> | 2016-04-11 09:28:28 -0700 |
commit | 1c751fcf488189e5176546fe0d00f560ffcf1cec (patch) | |
tree | 40863cdb5ac52b6fdc74a22d64853ea07826e6be /dev/change-scala-version.sh | |
parent | e82d95bf63f57cefa02dc545ceb451ecdeedce28 (diff) | |
download | spark-1c751fcf488189e5176546fe0d00f560ffcf1cec.tar.gz spark-1c751fcf488189e5176546fe0d00f560ffcf1cec.tar.bz2 spark-1c751fcf488189e5176546fe0d00f560ffcf1cec.zip |
[SPARK-14500] [ML] Accept Dataset[_] instead of DataFrame in MLlib APIs
## What changes were proposed in this pull request?
This PR updates MLlib APIs to accept `Dataset[_]` as input where `DataFrame` was the input type. This PR doesn't change the output type. In Java, `Dataset[_]` maps to `Dataset<?>`, which includes `Dataset<Row>`. Some implementations were changed in order to return `DataFrame`. Tests and examples were updated. Note that this is a breaking change for subclasses of Transformer/Estimator.
Lol, we don't have to rename the input argument, which has been `dataset` since Spark 1.2.
TODOs:
- [x] update MiMaExcludes (seems all covered by explicit filters from SPARK-13920)
- [x] Python
- [x] add a new test to accept Dataset[LabeledPoint]
- [x] remove unused imports of Dataset
## How was this patch tested?
Exiting unit tests with some modifications.
cc: rxin jkbradley
Author: Xiangrui Meng <meng@databricks.com>
Closes #12274 from mengxr/SPARK-14500.
Diffstat (limited to 'dev/change-scala-version.sh')
0 files changed, 0 insertions, 0 deletions