aboutsummaryrefslogtreecommitdiff
path: root/sql
diff options
context:
space:
mode:
authorJoseph K. Bradley <joseph@databricks.com>2015-11-10 16:20:10 -0800
committerJoseph K. Bradley <joseph@databricks.com>2015-11-10 16:20:10 -0800
commite281b87398f1298cc3df8e0409c7040acdddce03 (patch)
tree0b3c9361181479c47bc61e1000e103c831d52f72 /sql
parent1dde39d796bbf42336051a86bedf871c7fddd513 (diff)
downloadspark-e281b87398f1298cc3df8e0409c7040acdddce03.tar.gz
spark-e281b87398f1298cc3df8e0409c7040acdddce03.tar.bz2
spark-e281b87398f1298cc3df8e0409c7040acdddce03.zip
[SPARK-5565][ML] LDA wrapper for Pipelines API
This adds LDA to spark.ml, the Pipelines API. It follows the design doc in the JIRA: [https://issues.apache.org/jira/browse/SPARK-5565], with one major change: * I eliminated doc IDs. These are not necessary with DataFrames since the user can add an ID column as needed. Note: This will conflict with [https://github.com/apache/spark/pull/9484], but I'll try to merge [https://github.com/apache/spark/pull/9484] first and then rebase this PR. CC: hhbyyh feynmanliang If you have a chance to make a pass, that'd be really helpful--thanks! Now that I'm done traveling & this PR is almost ready, I'll see about reviewing other PRs critical for 1.6. CC: mengxr Author: Joseph K. Bradley <joseph@databricks.com> Closes #9513 from jkbradley/lda-pipelines.
Diffstat (limited to 'sql')
0 files changed, 0 insertions, 0 deletions