| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
## What changes were proposed in this pull request?
1,create a libsvm-type dataset for lda: `data/mllib/sample_lda_libsvm_data.txt`
2,add python example
3,directly read the datafile in examples
4,BTW, change to `SparkSession` in `aft_survival_regression.py`
## How was this patch tested?
manual tests
`./bin/spark-submit examples/src/main/python/ml/lda_example.py`
Author: Zheng RuiFeng <ruifengz@foxmail.com>
Closes #12927 from zhengruifeng/lda_pe.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ml.BisectingKMeans
## What changes were proposed in this pull request?
1, add BisectingKMeans to ml-clustering.md
2, add the missing Scala BisectingKMeansExample
3, create a new datafile `data/mllib/sample_kmeans_data.txt`
## How was this patch tested?
manual tests
Author: Zheng RuiFeng <ruifengz@foxmail.com>
Closes #11844 from zhengruifeng/doc_bkm.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
include_example
Replace example code in mllib-clustering.md using include_example
https://issues.apache.org/jira/browse/SPARK-13013
The example code in the user guide is embedded in the markdown and hence it is not easy to test. It would be nice to automatically test them. This JIRA is to discuss options to automate example code testing and see what we can do in Spark 1.6.
Goal is to move actual example code to spark/examples and test compilation in Jenkins builds. Then in the markdown, we can reference part of the code to show in the user guide. This requires adding a Jekyll tag that is similar to https://github.com/jekyll/jekyll/blob/master/lib/jekyll/tags/include.rb, e.g., called include_example.
`{% include_example scala/org/apache/spark/examples/mllib/KMeansExample.scala %}`
Jekyll will find `examples/src/main/scala/org/apache/spark/examples/mllib/KMeansExample.scala` and pick code blocks marked "example" and replace code block in
`{% highlight %}`
in the markdown.
See more sub-tasks in parent ticket: https://issues.apache.org/jira/browse/SPARK-11337
Author: Xin Ren <iamshrek@126.com>
Closes #11116 from keypointt/SPARK-13013.
|
|
|
|
|
|
|
|
|
|
| |
filtering in general
This documents the implementation of ALS in `spark.ml` with example code in scala, java and python.
Author: BenFradet <benjamin.fradet@gmail.com>
Closes #10411 from BenFradet/SPARK-12247.
|
|
|
|
|
|
|
|
|
|
| |
sentiment values
Example of joining a static RDD of word sentiments to a streaming RDD of Tweets in order to demo the usage of the transform() method.
Author: Jeff L <sha0lin@alumni.carnegiemellon.edu>
Closes #8431 from Agent007/SPARK-9057.
|
|
|
|
|
|
|
|
|
|
| |
Previous seed resulted in empty test data set.
Author: Paweł Kozikowski <mupakoz@gmail.com>
Closes #7477 from mupakoz/patch-1 and squashes the following commits:
f5d41ee [Paweł Kozikowski] Mllib Naive Bayes example data set enlarged
|
|
|
|
|
|
|
|
|
|
|
| |
Add Python user guide for PowerIterationClustering
Author: Yanbo Liang <ybliang8@gmail.com>
Closes #7155 from yanboliang/spark-8758 and squashes the following commits:
18d803b [Yanbo Liang] address comments
dd29577 [Yanbo Liang] Add Python user guide for PowerIterationClustering
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Including Iris Dataset (after shuffling and relabeling 3 -> 0 to confirm to 0 -> numClasses-1 labeling). Could not find an existing dataset in data/mllib for multiclass classification.
Author: Ram Sriharsha <rsriharsha@hw11853.local>
Closes #6296 from harsha2010/SPARK-7574 and squashes the following commits:
645427c [Ram Sriharsha] cleanup
46c41b1 [Ram Sriharsha] cleanup
2f76295 [Ram Sriharsha] Code Review Fixes
ebdf103 [Ram Sriharsha] Java Example
c026613 [Ram Sriharsha] Code Review fixes
4b7d1a6 [Ram Sriharsha] minor cleanup
13bed9c [Ram Sriharsha] add wikipedia link
bb9dbfa [Ram Sriharsha] Clean up naming
6f90db1 [Ram Sriharsha] [SPARK-7574][ml][doc] User guide for OneVsRest
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add parameter parsing in FPGrowth example app in Scala and Java
And a sample data file is added in data/mllib folder
Author: Jacky Li <jacky.likun@huawei.com>
Closes #4714 from jackylk/parameter and squashes the following commits:
8c478b3 [Jacky Li] fix according to comments
3bb74f6 [Jacky Li] make FPGrowth exampl app take parameters
f0e4d10 [Jacky Li] make FPGrowth exampl app take parameters
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For SPARK-5867:
* The spark.ml programming guide needs to be updated to use the new SQL DataFrame API instead of the old SchemaRDD API.
* It should also include Python examples now.
For SPARK-5892:
* Fix Python docs
* Various other cleanups
BTW, I accidentally merged this with master. If you want to compile it on your own, use this branch which is based on spark/branch-1.3 and cherry-picks the commits from this PR: [https://github.com/jkbradley/spark/tree/doc-review-1.3-check]
CC: mengxr (ML), davies (Python docs)
Author: Joseph K. Bradley <joseph@databricks.com>
Closes #4675 from jkbradley/doc-review-1.3 and squashes the following commits:
f191bb0 [Joseph K. Bradley] small cleanups
e786efa [Joseph K. Bradley] small doc corrections
6b1ab4a [Joseph K. Bradley] fixed python lint test
946affa [Joseph K. Bradley] Added sample data for ml.MovieLensALS example. Changed spark.ml Java examples to use DataFrames API instead of sql()
da81558 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into doc-review-1.3
629dbf5 [Joseph K. Bradley] Updated based on code review: * made new page for old migration guides * small fixes * moved inherit_doc in python
b9df7c4 [Joseph K. Bradley] Small cleanups: toDF to toDF(), adding s for string interpolation
34b067f [Joseph K. Bradley] small doc correction
da16aef [Joseph K. Bradley] Fixed python mllib docs
8cce91c [Joseph K. Bradley] GMM: removed old imports, added some doc
695f3f6 [Joseph K. Bradley] partly done trying to fix inherit_doc for class hierarchies in python docs
a72c018 [Joseph K. Bradley] made ChiSqTestResult appear in python docs
b05a80d [Joseph K. Bradley] organize imports. doc cleanups
e572827 [Joseph K. Bradley] updated programming guide for ml and mllib
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
User guide for isotonic regression added to docs/mllib-regression.md including code examples for Scala and Java.
Author: martinzapletal <zapletal-martin@email.cz>
Closes #4536 from zapletal-martin/SPARK-5502 and squashes the following commits:
67fe773 [martinzapletal] SPARK-5502 reworded model prediction rules to use more general language rather than the code/implementation specific terms
80bd4c3 [martinzapletal] SPARK-5502 created docs page for isotonic regression, added links to the page, updated data and examples
7d8136e [martinzapletal] SPARK-5502 Added documentation for Isotonic regression including examples for Scala and Java
504b5c3 [martinzapletal] SPARK-5502 Added documentation for Isotonic regression including examples for Scala and Java
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the LDA user guide from jkbradley with Java and Scala code example.
Author: Xiangrui Meng <meng@databricks.com>
Author: Joseph K. Bradley <joseph@databricks.com>
Closes #4465 from mengxr/lda-guide and squashes the following commits:
6dcb7d1 [Xiangrui Meng] update java example in the user guide
76169ff [Xiangrui Meng] update java example
36c3ae2 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into lda-guide
c2a1efe [Joseph K. Bradley] Added LDA programming guide, plus Java example (which is in the guide and probably should be removed).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
GaussianMixture
Simple description and code samples (and sample data) for GaussianMixture
Author: Travis Galoppo <tjg2107@columbia.edu>
Closes #4401 from tgaloppo/spark-5013 and squashes the following commits:
c9ff9a5 [Travis Galoppo] Fixed link in mllib-clustering.md Added Gaussian mixture and power iteration as available clustering techniques in mllib-guide
2368690 [Travis Galoppo] Minor fixes
3eb41fa [Travis Galoppo] [SPARK-5013] Added documentation and sample data file for GaussianMixture
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(Just made a PR for this, mengxr was the reporter of:)
MLlib has sample data under serveral folders:
1) data/mllib
2) data/
3) mllib/data/*
Per previous discussion with Matei Zaharia, we want to put them under `data/mllib` and clean outdated files.
Author: Sean Owen <sowen@cloudera.com>
Closes #1394 from srowen/SPARK-2363 and squashes the following commits:
54313dd [Sean Owen] Move ML example data from /mllib/data/ and /data/ into /data/mllib/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. Added synthetic datasets for `MovieLensALS`, `LinearRegression`, `BinaryClassification`.
2. Embedded instructions in the help message of those example apps.
Per discussion with Matei on the JIRA page, new example data is under `data/mllib`.
Author: Xiangrui Meng <meng@databricks.com>
Closes #833 from mengxr/mllib-sample-data and squashes the following commits:
59f0a18 [Xiangrui Meng] add sample binary classification data
3c2f92f [Xiangrui Meng] add linear regression data
050f1ca [Xiangrui Meng] add a sample dataset for MovieLensALS example
|
|
Signed-off-by: shane-huang <shengsheng.huang@intel.com>
|