diff options
author | Joseph K. Bradley <joseph@databricks.com> | 2015-11-10 16:20:10 -0800 |
---|---|---|
committer | Joseph K. Bradley <joseph@databricks.com> | 2015-11-10 16:20:10 -0800 |
commit | e281b87398f1298cc3df8e0409c7040acdddce03 (patch) | |
tree | 0b3c9361181479c47bc61e1000e103c831d52f72 /pylintrc | |
parent | 1dde39d796bbf42336051a86bedf871c7fddd513 (diff) | |
download | spark-e281b87398f1298cc3df8e0409c7040acdddce03.tar.gz spark-e281b87398f1298cc3df8e0409c7040acdddce03.tar.bz2 spark-e281b87398f1298cc3df8e0409c7040acdddce03.zip |
[SPARK-5565][ML] LDA wrapper for Pipelines API
This adds LDA to spark.ml, the Pipelines API. It follows the design doc in the JIRA: [https://issues.apache.org/jira/browse/SPARK-5565], with one major change:
* I eliminated doc IDs. These are not necessary with DataFrames since the user can add an ID column as needed.
Note: This will conflict with [https://github.com/apache/spark/pull/9484], but I'll try to merge [https://github.com/apache/spark/pull/9484] first and then rebase this PR.
CC: hhbyyh feynmanliang If you have a chance to make a pass, that'd be really helpful--thanks! Now that I'm done traveling & this PR is almost ready, I'll see about reviewing other PRs critical for 1.6.
CC: mengxr
Author: Joseph K. Bradley <joseph@databricks.com>
Closes #9513 from jkbradley/lda-pipelines.
Diffstat (limited to 'pylintrc')
0 files changed, 0 insertions, 0 deletions