spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	[SPARK-16421][EXAMPLES][ML] Improve ML Example Outputs	Bryan Cutler	2016-08-05	3	-2569/+0
\| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Improve example outputs to better reflect the functionality that is being presented. This mostly consisted of modifying what was printed at the end of the example, such as calling show() with truncate=False, but sometimes required minor tweaks in the example data to get relevant output. Explicitly set parameters when they are used as part of the example. Fixed Java examples that failed to run because of using old-style MLlib Vectors or problem with schema. Synced examples between different APIs. ## How was this patch tested? Ran each example for Scala, Python, and Java and made sure output was legible on a terminal of width 100. Author: Bryan Cutler <cutlerb@gmail.com> Closes #14308 from BryanCutler/ml-examples-improve-output-SPARK-16260.
*	[GRAPHX][EXAMPLES] move graphx test data directory and update graphx document	WeichenXu	2016-07-02	2	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? There are two test data files used for graphx examples existing in directory "graphx/data" I move it into "data/" directory because the "graphx" directory is used for code files and other test data files (such as mllib, streaming test data) are all in there. I also update the graphx document where reference the data files which I move place. ## How was this patch tested? N/A Author: WeichenXu <WeichenXu123@outlook.com> Closes #14010 from WeichenXu123/move_graphx_data_dir.
*	[SPARK-15608][ML][EXAMPLES][DOC] add examples and documents of ml.isotonic ↵	WeichenXu	2016-06-16	2	-100/+100
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	regression ## What changes were proposed in this pull request? add ml doc for ml isotonic regression add scala example for ml isotonic regression add java example for ml isotonic regression add python example for ml isotonic regression modify scala example for mllib isotonic regression modify java example for mllib isotonic regression modify python example for mllib isotonic regression add data/mllib/sample_isotonic_regression_libsvm_data.txt delete data/mllib/sample_isotonic_regression_data.txt ## How was this patch tested? N/A Author: WeichenXu <WeichenXu123@outlook.com> Closes #13381 from WeichenXu123/add_isotonic_regression_doc.
*	[SPARK-15449][MLLIB][EXAMPLE] Wrong Data Format - Documentation Issue	wm624@hotmail.com	2016-05-27	1	-12/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) In the MLLib naivebayes example, scala and python example doesn't use libsvm data, but Java does. I make changes in scala and python example to use the libsvm data as the same as Java example. ## How was this patch tested? Manual tests Author: wm624@hotmail.com <wm624@hotmail.com> Closes #13301 from wangmiao1981/example.
*	[SPARK-15150][EXAMPLE][DOC] Update LDA examples	Zheng RuiFeng	2016-05-11	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? 1,create a libsvm-type dataset for lda: `data/mllib/sample_lda_libsvm_data.txt` 2,add python example 3,directly read the datafile in examples 4,BTW, change to `SparkSession` in `aft_survival_regression.py` ## How was this patch tested? manual tests `./bin/spark-submit examples/src/main/python/ml/lda_example.py` Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #12927 from zhengruifeng/lda_pe.
*	[SPARK-14340][EXAMPLE][DOC] Update Examples and User Guide for ↵	Zheng RuiFeng	2016-05-11	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ml.BisectingKMeans ## What changes were proposed in this pull request? 1, add BisectingKMeans to ml-clustering.md 2, add the missing Scala BisectingKMeansExample 3, create a new datafile `data/mllib/sample_kmeans_data.txt` ## How was this patch tested? manual tests Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #11844 from zhengruifeng/doc_bkm.
*	[SPARK-13013][DOCS] Replace example code in mllib-clustering.md using ↵	Xin Ren	2016-03-03	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	include_example Replace example code in mllib-clustering.md using include_example https://issues.apache.org/jira/browse/SPARK-13013 The example code in the user guide is embedded in the markdown and hence it is not easy to test. It would be nice to automatically test them. This JIRA is to discuss options to automate example code testing and see what we can do in Spark 1.6. Goal is to move actual example code to spark/examples and test compilation in Jenkins builds. Then in the markdown, we can reference part of the code to show in the user guide. This requires adding a Jekyll tag that is similar to https://github.com/jekyll/jekyll/blob/master/lib/jekyll/tags/include.rb, e.g., called include_example. `{% include_example scala/org/apache/spark/examples/mllib/KMeansExample.scala %}` Jekyll will find `examples/src/main/scala/org/apache/spark/examples/mllib/KMeansExample.scala` and pick code blocks marked "example" and replace code block in `{% highlight %}` in the markdown. See more sub-tasks in parent ticket: https://issues.apache.org/jira/browse/SPARK-11337 Author: Xin Ren <iamshrek@126.com> Closes #11116 from keypointt/SPARK-13013.
*	[SPARK-12247][ML][DOC] Documentation for spark.ml's ALS and collaborative ↵	BenFradet	2016-02-16	1	-100/+0
\| \| \| \| \| \| \| \| \| \|	filtering in general This documents the implementation of ALS in `spark.ml` with example code in scala, java and python. Author: BenFradet <benjamin.fradet@gmail.com> Closes #10411 from BenFradet/SPARK-12247.
*	[SPARK-9057][STREAMING] Twitter example joining to static RDD of word ↵	Jeff L	2015-12-18	1	-0/+2477
\| \| \| \| \| \| \| \| \| \|	sentiment values Example of joining a static RDD of word sentiments to a streaming RDD of Tweets in order to demo the usage of the transform() method. Author: Jeff L <sha0lin@alumni.carnegiemellon.edu> Closes #8431 from Agent007/SPARK-9057.
*	[MLLIB] [DOC] Seed fix in mllib naive bayes example	Paweł Kozikowski	2015-07-18	1	-0/+6
\| \| \| \| \| \| \| \| \| \|	Previous seed resulted in empty test data set. Author: Paweł Kozikowski <mupakoz@gmail.com> Closes #7477 from mupakoz/patch-1 and squashes the following commits: f5d41ee [Paweł Kozikowski] Mllib Naive Bayes example data set enlarged
*	[SPARK-8758] [MLLIB] Add Python user guide for PowerIterationClustering	Yanbo Liang	2015-07-02	1	-0/+19
\| \| \| \| \| \| \| \| \| \| \|	Add Python user guide for PowerIterationClustering Author: Yanbo Liang <ybliang8@gmail.com> Closes #7155 from yanboliang/spark-8758 and squashes the following commits: 18d803b [Yanbo Liang] address comments dd29577 [Yanbo Liang] Add Python user guide for PowerIterationClustering
*	[SPARK-7574] [ML] [DOC] User guide for OneVsRest	Ram Sriharsha	2015-05-22	1	-0/+150
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Including Iris Dataset (after shuffling and relabeling 3 -> 0 to confirm to 0 -> numClasses-1 labeling). Could not find an existing dataset in data/mllib for multiclass classification. Author: Ram Sriharsha <rsriharsha@hw11853.local> Closes #6296 from harsha2010/SPARK-7574 and squashes the following commits: 645427c [Ram Sriharsha] cleanup 46c41b1 [Ram Sriharsha] cleanup 2f76295 [Ram Sriharsha] Code Review Fixes ebdf103 [Ram Sriharsha] Java Example c026613 [Ram Sriharsha] Code Review fixes 4b7d1a6 [Ram Sriharsha] minor cleanup 13bed9c [Ram Sriharsha] add wikipedia link bb9dbfa [Ram Sriharsha] Clean up naming 6f90db1 [Ram Sriharsha] [SPARK-7574][ml][doc] User guide for OneVsRest
*	[SPARK-5939][MLLib] make FPGrowth example app take parameters	Jacky Li	2015-02-23	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	Add parameter parsing in FPGrowth example app in Scala and Java And a sample data file is added in data/mllib folder Author: Jacky Li <jacky.likun@huawei.com> Closes #4714 from jackylk/parameter and squashes the following commits: 8c478b3 [Jacky Li] fix according to comments 3bb74f6 [Jacky Li] make FPGrowth exampl app take parameters f0e4d10 [Jacky Li] make FPGrowth exampl app take parameters
*	[SPARK-5867] [SPARK-5892] [doc] [ml] [mllib] Doc cleanups for 1.3 release	Joseph K. Bradley	2015-02-20	2	-0/+1601
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For SPARK-5867: * The spark.ml programming guide needs to be updated to use the new SQL DataFrame API instead of the old SchemaRDD API. * It should also include Python examples now. For SPARK-5892: * Fix Python docs * Various other cleanups BTW, I accidentally merged this with master. If you want to compile it on your own, use this branch which is based on spark/branch-1.3 and cherry-picks the commits from this PR: [https://github.com/jkbradley/spark/tree/doc-review-1.3-check] CC: mengxr (ML), davies (Python docs) Author: Joseph K. Bradley <joseph@databricks.com> Closes #4675 from jkbradley/doc-review-1.3 and squashes the following commits: f191bb0 [Joseph K. Bradley] small cleanups e786efa [Joseph K. Bradley] small doc corrections 6b1ab4a [Joseph K. Bradley] fixed python lint test 946affa [Joseph K. Bradley] Added sample data for ml.MovieLensALS example. Changed spark.ml Java examples to use DataFrames API instead of sql() da81558 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into doc-review-1.3 629dbf5 [Joseph K. Bradley] Updated based on code review: * made new page for old migration guides * small fixes * moved inherit_doc in python b9df7c4 [Joseph K. Bradley] Small cleanups: toDF to toDF(), adding s for string interpolation 34b067f [Joseph K. Bradley] small doc correction da16aef [Joseph K. Bradley] Fixed python mllib docs 8cce91c [Joseph K. Bradley] GMM: removed old imports, added some doc 695f3f6 [Joseph K. Bradley] partly done trying to fix inherit_doc for class hierarchies in python docs a72c018 [Joseph K. Bradley] made ChiSqTestResult appear in python docs b05a80d [Joseph K. Bradley] organize imports. doc cleanups e572827 [Joseph K. Bradley] updated programming guide for ml and mllib
*	[MLLIB][SPARK-5502] User guide for isotonic regression	martinzapletal	2015-02-15	1	-0/+100
\| \| \| \| \| \| \| \| \| \| \| \| \|	User guide for isotonic regression added to docs/mllib-regression.md including code examples for Scala and Java. Author: martinzapletal <zapletal-martin@email.cz> Closes #4536 from zapletal-martin/SPARK-5502 and squashes the following commits: 67fe773 [martinzapletal] SPARK-5502 reworded model prediction rules to use more general language rather than the code/implementation specific terms 80bd4c3 [martinzapletal] SPARK-5502 created docs page for isotonic regression, added links to the page, updated data and examples 7d8136e [martinzapletal] SPARK-5502 Added documentation for Isotonic regression including examples for Scala and Java 504b5c3 [martinzapletal] SPARK-5502 Added documentation for Isotonic regression including examples for Scala and Java
*	[SPARK-5539][MLLIB] LDA guide	Xiangrui Meng	2015-02-08	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the LDA user guide from jkbradley with Java and Scala code example. Author: Xiangrui Meng <meng@databricks.com> Author: Joseph K. Bradley <joseph@databricks.com> Closes #4465 from mengxr/lda-guide and squashes the following commits: 6dcb7d1 [Xiangrui Meng] update java example in the user guide 76169ff [Xiangrui Meng] update java example 36c3ae2 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into lda-guide c2a1efe [Joseph K. Bradley] Added LDA programming guide, plus Java example (which is in the guide and probably should be removed).
*	[SPARK-5013] [MLlib] Added documentation and sample data file for ↵	Travis Galoppo	2015-02-06	1	-0/+2000
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	GaussianMixture Simple description and code samples (and sample data) for GaussianMixture Author: Travis Galoppo <tjg2107@columbia.edu> Closes #4401 from tgaloppo/spark-5013 and squashes the following commits: c9ff9a5 [Travis Galoppo] Fixed link in mllib-clustering.md Added Gaussian mixture and power iteration as available clustering techniques in mllib-guide 2368690 [Travis Galoppo] Minor fixes 3eb41fa [Travis Galoppo] [SPARK-5013] Added documentation and sample data file for GaussianMixture
*	SPARK-2363. Clean MLlib's sample data files	Sean Owen	2014-07-13	10	-0/+2080
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(Just made a PR for this, mengxr was the reporter of:) MLlib has sample data under serveral folders: 1) data/mllib 2) data/ 3) mllib/data/* Per previous discussion with Matei Zaharia, we want to put them under `data/mllib` and clean outdated files. Author: Sean Owen <sowen@cloudera.com> Closes #1394 from srowen/SPARK-2363 and squashes the following commits: 54313dd [Sean Owen] Move ML example data from /mllib/data/ and /data/ into /data/mllib/
*	[SPARK-1874][MLLIB] Clean up MLlib sample data	Xiangrui Meng	2014-05-19	3	-0/+2102
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1. Added synthetic datasets for `MovieLensALS`, `LinearRegression`, `BinaryClassification`. 2. Embedded instructions in the help message of those example apps. Per discussion with Matei on the JIRA page, new example data is under `data/mllib`. Author: Xiangrui Meng <meng@databricks.com> Closes #833 from mengxr/mllib-sample-data and squashes the following commits: 59f0a18 [Xiangrui Meng] add sample binary classification data 3c2f92f [Xiangrui Meng] add linear regression data 050f1ca [Xiangrui Meng] add a sample dataset for MovieLensALS example
*	moved user scripts to bin folder	shane-huang	2013-09-23	3	-0/+1012
	Signed-off-by: shane-huang <shengsheng.huang@intel.com>