spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	[SPARK-7848][STREAMING][UPDATE SPARKSTREAMING DOCS TO INCORPORATE IMPORTANT ↵	Nirman Narang	2016-06-15	1	-0/+19
\| \| \| \| \| \| \| \| \| \|	POINTS.] Updated the SparkStreaming Doc with some important points. Author: Nirman Narang <narang@us.ibm.com> Closes #11114 from nirmannarang/SPARK-7848.
*	[DOCUMENTATION] fixed typos in python programming guide	Mortada Mehyar	2016-06-14	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? minor typo ## How was this patch tested? minor typo in the doc, should be self explanatory Author: Mortada Mehyar <mortada.mehyar@gmail.com> Closes #13639 from mortada/typo.
*	[SPARK-15086][CORE][STREAMING] Deprecate old Java accumulator API	Sean Owen	2016-06-12	2	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? - Deprecate old Java accumulator API; should use Scala now - Update Java tests and examples - Don't bother testing old accumulator API in Java 8 (too) - (fix a misspelling too) ## How was this patch tested? Jenkins tests Author: Sean Owen <sowen@cloudera.com> Closes #13606 from srowen/SPARK-15086.
*	[SPARK-15806][DOCUMENTATION] update doc for SPARK_MASTER_IP	bomeng	2016-06-12	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? SPARK_MASTER_IP is a deprecated environment variable. It is replaced by SPARK_MASTER_HOST according to MasterArguments.scala. ## How was this patch tested? Manually verified. Author: bomeng <bmeng@us.ibm.com> Closes #13543 from bomeng/SPARK-15806.
*	[SPARK-15781][DOCUMENTATION] remove deprecated environment variable doc	bomeng	2016-06-12	1	-9/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Like `SPARK_JAVA_OPTS` and `SPARK_CLASSPATH`, we will remove the document for `SPARK_WORKER_INSTANCES` to discourage user not to use them. If they are actually used, SparkConf will show a warning message as before. ## How was this patch tested? Manually tested. Author: bomeng <bmeng@us.ibm.com> Closes #13533 from bomeng/SPARK-15781.
*	[SPARK-15883][MLLIB][DOCS] Fix broken links in mllib documents	Dongjoon Hyun	2016-06-11	8	-29/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This issue fixes all broken links on Spark 2.0 preview MLLib documents. Also, this contains some editorial change. Fix broken links * mllib-data-types.md * mllib-decision-tree.md * mllib-ensembles.md * mllib-feature-extraction.md * mllib-pmml-model-export.md * mllib-statistics.md Fix malformed section header and scala coding style * mllib-linear-methods.md Replace indirect forward links with direct one * ml-classification-regression.md ## How was this patch tested? Manual tests (with `cd docs; jekyll build`.) Author: Dongjoon Hyun <dongjoon@apache.org> Closes #13608 from dongjoon-hyun/SPARK-15883.
*	[SPARK-15879][DOCS][UI] Update logo in UI and docs to add "Apache"	Sean Owen	2016-06-11	5	-0/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Use new Spark logo including "Apache" (now, with crushed PNGs). Remove old unreferenced logo files. ## How was this patch tested? Manual check of generated HTML site and Spark UI. I searched for references to the deleted files to make sure they were not used. Author: Sean Owen <sowen@cloudera.com> Closes #13609 from srowen/SPARK-15879.
*	[DOCUMENTATION] fixed groupby aggregation example for pyspark	Mortada Mehyar	2016-06-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? fixing documentation for the groupby/agg example in python ## How was this patch tested? the existing example in the documentation dose not contain valid syntax (missing parenthesis) and is not using `Column` in the expression for `agg()` after the fix here's how I tested it: ``` In [1]: from pyspark.sql import Row In [2]: import pyspark.sql.functions as func In [3]: %cpaste Pasting code; enter '--' alone on the line to stop or use Ctrl-D. :records = [{'age': 19, 'department': 1, 'expense': 100}, : {'age': 20, 'department': 1, 'expense': 200}, : {'age': 21, 'department': 2, 'expense': 300}, : {'age': 22, 'department': 2, 'expense': 300}, : {'age': 23, 'department': 3, 'expense': 300}] :-- In [4]: df = sqlContext.createDataFrame([Row(**d) for d in records]) In [5]: df.groupBy("department").agg(df["department"], func.max("age"), func.sum("expense")).show() +----------+----------+--------+------------+ \|department\|department\|max(age)\|sum(expense)\| +----------+----------+--------+------------+ \| 1\| 1\| 20\| 300\| \| 2\| 2\| 22\| 600\| \| 3\| 3\| 23\| 300\| +----------+----------+--------+------------+ Author: Mortada Mehyar <mortada.mehyar@gmail.com> Closes #13587 from mortada/groupby_agg_doc_fix.
*	[DOCUMENTATION] Fixed target JAR path	prabs	2016-06-08	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Mentioned Scala version in the sbt configuration file is 2.11, so the path of the target JAR should be `/target/scala-2.11/simple-project_2.11-1.0.jar` ## How was this patch tested? n/a Author: prabs <prabsmails@gmail.com> Author: Prabeesh K <prabsmails@gmail.com> Closes #13554 from prabeesh/master.
*	[SPARK-13590][ML][DOC] Document spark.ml LiR, LoR and AFTSurvivalRegression ↵	Yanbo Liang	2016-06-07	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	behavior difference ## What changes were proposed in this pull request? When fitting ```LinearRegressionModel```(by "l-bfgs" solver) and ```LogisticRegressionModel``` w/o intercept on dataset with constant nonzero column, spark.ml produce same model as R glmnet but different from LIBSVM. When fitting ```AFTSurvivalRegressionModel``` w/o intercept on dataset with constant nonzero column, spark.ml produce different model compared with R survival::survreg. We should output a warning message and clarify in document for this condition. ## How was this patch tested? Document change, no unit test. cc mengxr Author: Yanbo Liang <ybliang8@gmail.com> Closes #12731 from yanboliang/spark-13590.
*	[SPARK-15760][DOCS] Add documentation for package-related configs.	Marcelo Vanzin	2016-06-07	1	-0/+47
\| \| \| \| \| \| \| \| \|	While there, also document spark.files and spark.jars. Text is the same as the spark-submit help text with some minor adjustments. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #13502 from vanzin/SPARK-15760.
*	[MINOR] fix typo in documents	WeichenXu	2016-06-07	3	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? I use spell check tools checks typo in spark documents and fix them. ## How was this patch tested? N/A Author: WeichenXu <WeichenXu123@outlook.com> Closes #13538 from WeichenXu123/fix_doc_typo.
*	[SPARK-15617][ML][DOC] Clarify that fMeasure in MulticlassMetrics is "micro" ↵	Ruifeng Zheng	2016-06-04	1	-13/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	f1_score ## What changes were proposed in this pull request? 1, del precision,recall in `ml.MulticlassClassificationEvaluator` 2, update user guide for `mlllib.weightedFMeasure` ## How was this patch tested? local build Author: Ruifeng Zheng <ruifengz@foxmail.com> Closes #13390 from zhengruifeng/clarify_f1.
*	[SPARK-15208][WIP][CORE][STREAMING][DOCS] Update Spark examples with ↵	Liwei Lin	2016-06-02	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	AccumulatorV2 ## What changes were proposed in this pull request? The patch updates the codes & docs in the example module as well as the related doc module: - [ ] [docs] `streaming-programming-guide.md` - [x] scala code part - [ ] java code part - [ ] python code part - [x] [examples] `RecoverableNetworkWordCount.scala` - [ ] [examples] `JavaRecoverableNetworkWordCount.java` - [ ] [examples] `recoverable_network_wordcount.py` ## How was this patch tested? Ran the examples and verified results manually. Author: Liwei Lin <lwlin7@gmail.com> Closes #12981 from lw-lin/accumulatorV2-examples.
*	[SPARK-15702][DOCUMENTATION] Update document programming-guide accumulator ↵	WeichenXu	2016-06-01	1	-18/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	section ## What changes were proposed in this pull request? Update document programming-guide accumulator section (scala language) java and python version, because the API haven't done, so I do not modify them. ## How was this patch tested? N/A Author: WeichenXu <WeichenXu123@outlook.com> Closes #13441 from WeichenXu123/update_doc_accumulatorV2_clean.
*	[DOCS] fix example code issues in documentation	Matthew Wise	2016-05-30	2	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Fixed broken java code examples in streaming documentation Attn: tdas Author: Matthew Wise <matthew.rs.wise@gmail.com> Closes #13388 from mawise/fix_docs_java_streaming_example.
*	[SPARK-11959][SPARK-15484][DOC][ML] Document WLS and IRLS	Yanbo Liang	2016-05-27	1	-5/+80
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? * Document ```WeightedLeastSquares```(normal equation) and ```IterativelyReweightedLeastSquares```. * Copy ```L-BFGS``` documents from ```spark.mllib``` to ```spark.ml```. Due to the session ```Optimization of linear methods``` is used for developers, I think we should provide the brief introduction of the optimization method, necessary references and how it implements in Spark. It's not necessary to paste all mathematical formula and derivation here. If developers/users want to learn more, they can track reference. ## How was this patch tested? Document update, no tests. Author: Yanbo Liang <ybliang8@gmail.com> Closes #13262 from yanboliang/spark-15484.
*	[SPARK-15186][ML][DOCS] Add user guide for generalized linear regression	sethah	2016-05-27	1	-0/+132
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This patch adds a user guide section for generalized linear regression and includes the examples from [#12754](https://github.com/apache/spark/pull/12754). ## How was this patch tested? Documentation only, no tests required. ## Approach In general, it is a bit unclear what level of detail ought to be included in the user guide since there is a lot of variability within the current user guide. I tried to give a fairly brief mathematical introduction to GLMs, and cover what types of problems they could be used for. Additionally, I included a brief blurb on the IRLS solver. The input/output columns are given in a table as is found elsewhere in the docs (though, again, these appear rather intermittently in the current docs), as well as a table providing the supported families and their link functions. Author: sethah <seth.hendrickson16@gmail.com> Closes #13139 from sethah/SPARK-15186.
*	[YARN][DOC][MINOR] Remove several obsolete env variables and update the doc	jerryshao	2016-05-27	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Remove several obsolete env variables not supported for Spark on YARN now, also updates the docs to include several changes with 2.0. ## How was this patch tested? N/A CC vanzin tgravescs Author: jerryshao <sshao@hortonworks.com> Closes #13296 from jerryshao/yarn-doc.
*	[MINOR] Fix Typos 'a -> an'	Zheng RuiFeng	2016-05-26	2	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? `a` -> `an` I use regex to generate potential error lines: `grep -in ' a [aeiou]' mllib/src/main/scala/org/apache/spark/ml//scala` and review them line by line. ## How was this patch tested? local build `lint-java` checking Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #13317 from zhengruifeng/a_an.
*	[SPARK-10903] followup - update API doc for SqlContext	felixcheung	2016-05-26	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Follow up on the earlier PR - in here we are fixing up roxygen2 doc examples. Also add to the programming guide migration section. ## How was this patch tested? SparkR tests Author: felixcheung <felixcheung_m@hotmail.com> Closes #13340 from felixcheung/sqlcontextdoc.
*	[SPARK-13148][YARN] document zero-keytab Oozie application launch; add ↵	Steve Loughran	2016-05-26	1	-0/+96
\| \| \| \| \| \| \| \| \| \| \|	diagnostics This patch provides detail on what to do for keytabless Oozie launches of spark apps, and adds some debug-level diagnostics of what credentials have been submitted Author: Steve Loughran <stevel@hortonworks.com> Author: Steve Loughran <stevel@apache.org> Closes #11033 from steveloughran/stevel/feature/SPARK-13148-oozie.
*	[SPARK-12071][DOC] Document the behaviour of NA in R	Krishna Kalyan	2016-05-24	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Under Upgrading From SparkR 1.5.x to 1.6.x section added the information, SparkSQL converts `NA` in R to `null`. ## How was this patch tested? Document update, no tests. Author: Krishna Kalyan <krishnakalyan3@gmail.com> Closes #13268 from krishnakalyan3/spark-12071-1.
*	[SPARK-15412][PYSPARK][SPARKR][DOCS] Improve linear isotonic regression ↵	Holden Karau	2016-05-24	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	pydoc & doc build insturctions ## What changes were proposed in this pull request? PySpark: Add links to the predictors from the models in regression.py, improve linear and isotonic pydoc in minor ways. User guide / R: Switch the installed package list to be enough to build the R docs on a "fresh" install on ubuntu and add sudo to match the rest of the commands. User Guide: Add a note about using gem2.0 for systems with both 1.9 and 2.0 (e.g. some ubuntu but maybe more). ## How was this patch tested? built pydocs locally, tested new user build instructions Author: Holden Karau <holden@us.ibm.com> Closes #13199 from holdenk/SPARK-15412-improve-linear-isotonic-regression-pydoc.
*	[SPARK-15502][DOC][ML][PYSPARK] add guide note that ALS only supports ↵	Nick Pentreath	2016-05-24	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	integer ids This PR adds a note to clarify that the ML API for ALS only supports integers for user/item ids, and that other types for these columns can be used but the ids must fall within integer range. (Refer [SPARK-14891](https://issues.apache.org/jira/browse/SPARK-14891)). Also cleaned up a reference to `mllib` in the ML doc. ## How was this patch tested? Built and viewed User Guide doc locally. Author: Nick Pentreath <nickp@za.ibm.com> Closes #13278 from MLnick/SPARK-15502-als-int-id-doc-note.
*	[SPARK-15485][SQL][DOCS] Spark SQL Configuration	gatorsmile	2016-05-23	1	-0/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	#### What changes were proposed in this pull request? So far, the page Configuration in the official documentation does not have a section for Spark SQL. http://spark.apache.org/docs/latest/configuration.html For Spark users, the information and default values of these public configuration parameters are very useful. This PR is to add this missing section to the configuration.html. rxin yhuai marmbrus #### How was this patch tested? Below is the generated webpage. <img width="924" alt="screenshot 2016-05-23 11 35 57" src="https://cloud.githubusercontent.com/assets/11567269/15480492/b08fefc4-20da-11e6-9fa2-7cd5b699ed35.png"> <img width="914" alt="screenshot 2016-05-23 11 37 38" src="https://cloud.githubusercontent.com/assets/11567269/15480499/c5f9482e-20da-11e6-95ff-10821add1af4.png"> <img width="923" alt="screenshot 2016-05-23 11 36 11" src="https://cloud.githubusercontent.com/assets/11567269/15480506/cbd81644-20da-11e6-9d27-effb716b2fac.png"> <img width="920" alt="screenshot 2016-05-23 11 36 18" src="https://cloud.githubusercontent.com/assets/11567269/15480511/d013e332-20da-11e6-854a-cf8813c46f36.png"> Author: gatorsmile <gatorsmile@gmail.com> Closes #13263 from gatorsmile/configurationSQL.
*	[SPARK-15396][SQL][DOC] It can't connect hive metastore database	gatorsmile	2016-05-21	1	-29/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	#### What changes were proposed in this pull request? The `hive.metastore.warehouse.dir` property in hive-site.xml is deprecated since Spark 2.0.0. Users might not be able to connect to the existing metastore if they do not use the new conf parameter `spark.sql.warehouse.dir`. This PR is to update the document and example for explaining the latest changes in the configuration of default location of database. Below is the screenshot of the latest generated docs: <img width="681" alt="screenshot 2016-05-20 08 38 10" src="https://cloud.githubusercontent.com/assets/11567269/15433296/a05c4ace-1e66-11e6-8d2b-73682b32e9c2.png"> <img width="789" alt="screenshot 2016-05-20 08 53 26" src="https://cloud.githubusercontent.com/assets/11567269/15433734/645dc42e-1e68-11e6-9476-effc9f8721bb.png"> <img width="789" alt="screenshot 2016-05-20 08 53 37" src="https://cloud.githubusercontent.com/assets/11567269/15433738/68569f92-1e68-11e6-83d3-ef5bb221a8d8.png"> No change is made in the R's example. <img width="860" alt="screenshot 2016-05-20 08 54 38" src="https://cloud.githubusercontent.com/assets/11567269/15433779/965b8312-1e68-11e6-8bc4-53c88ceacde2.png"> #### How was this patch tested? N/A Author: gatorsmile <gatorsmile@gmail.com> Closes #13225 from gatorsmile/document.
*	[SPARK-15394][ML][DOCS] User guide typos and grammar audit	sethah	2016-05-19	5	-46/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Correct some typos and incorrectly worded sentences. ## How was this patch tested? Doc changes only. Note that many of these changes were identified by whomfire01 Author: sethah <seth.hendrickson16@gmail.com> Closes #13180 from sethah/ml_guide_audit.
*	[SPARK-15171][SQL] Remove the references to deprecated method ↵	Sean Zhong	2016-05-18	2	-30/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	dataset.registerTempTable ## What changes were proposed in this pull request? Update the unit test code, examples, and documents to remove calls to deprecated method `dataset.registerTempTable`. ## How was this patch tested? This PR only changes the unit test code, examples, and comments. It should be safe. This is a follow up of PR https://github.com/apache/spark/pull/12945 which was merged. Author: Sean Zhong <seanzhong@databricks.com> Closes #13098 from clockfly/spark-15171-remove-deprecation.
*	[SPARK-15182][ML] Copy MLlib doc to ML: ml.feature.tf, idf	Yuhao Yang	2016-05-17	2	-9/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? We should now begin copying algorithm details from the spark.mllib guide to spark.ml as needed, rather than just linking back to the corresponding algorithms in the spark.mllib user guide. ## How was this patch tested? manual review for doc. Author: Yuhao Yang <hhbyyh@gmail.com> Author: Yuhao Yang <yuhao.yang@intel.com> Closes #12957 from hhbyyh/tfidfdoc.
*	[SPARK-15333][DOCS] Reorganize building-spark.md; rationalize vs wiki	Sean Owen	2016-05-17	1	-139/+156
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? See JIRA for the motivation. The changes are almost entirely movement of text and edits to sections. Minor changes to text include: - Copying in / merging text from the "Useful Developer Tools" wiki, in areas of - Docker - R - Running one test - standardizing on ./build/mvn not mvn, and likewise for ./build/sbt - correcting some typos - standardizing code block formatting No text has been removed from this doc; text has been imported from the https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools wiki ## How was this patch tested? Jekyll doc build and inspection of resulting HTML in browser. Author: Sean Owen <sowen@cloudera.com> Closes #13124 from srowen/SPARK-15333.
*	[SPARK-14434][ML] User guide doc and examples for GaussianMixture in spark.ml	wm624@hotmail.com	2016-05-17	1	-0/+82
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) Add guide doc and examples for GaussianMixture in Spark.ml in Java, Scala and Python. ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) Manual compile and test all examples Author: wm624@hotmail.com <wm624@hotmail.com> Closes #12788 from wangmiao1981/example.
*	[SPARK-15305][ML][DOC] spark.ml document Bisectiong k-means has the ↵	wm624@hotmail.com	2016-05-16	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	incorrect format ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) The generated document has the incorrect format for biseckmeans. ![bug](https://cloud.githubusercontent.com/assets/5033592/15233120/d910098a-185a-11e6-901d-44aeafc8a011.jpg) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) Fix the formatting. ![fix](https://cloud.githubusercontent.com/assets/5033592/15233136/fce2ccd0-185a-11e6-9ded-14d71da4bdab.jpg) Author: wm624@hotmail.com <wm624@hotmail.com> Closes #13083 from wangmiao1981/doc.
*	[MINOR] Fix Typos	Zheng RuiFeng	2016-05-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? 1,Rename matrix args in BreezeUtil to upper to match the doc 2,Fix several typos in ML and SQL ## How was this patch tested? manual tests Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #13078 from zhengruifeng/fix_ann.
*	[SPARK-15085][STREAMING][KAFKA] Rename streaming-kafka artifact	cody koeninger	2016-05-11	2	-9/+9
\| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Renaming the streaming-kafka artifact to include kafka version, in anticipation of needing a different artifact for later kafka versions ## How was this patch tested? Unit tests Author: cody koeninger <cody@koeninger.org> Closes #12946 from koeninger/SPARK-15085.
*	[SPARK-15150][EXAMPLE][DOC] Update LDA examples	Zheng RuiFeng	2016-05-11	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? 1,create a libsvm-type dataset for lda: `data/mllib/sample_lda_libsvm_data.txt` 2,add python example 3,directly read the datafile in examples 4,BTW, change to `SparkSession` in `aft_survival_regression.py` ## How was this patch tested? manual tests `./bin/spark-submit examples/src/main/python/ml/lda_example.py` Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #12927 from zhengruifeng/lda_pe.
*	[SPARK-15238] Clarify supported Python versions	Nicholas Chammas	2016-05-11	1	-2/+2
\| \| \| \| \| \| \| \| \|	This PR: * Clarifies that Spark does support Python 3, starting with Python 3.4. Author: Nicholas Chammas <nicholas.chammas@gmail.com> Closes #13017 from nchammas/supported-python-versions.
*	[SPARK-15149][EXAMPLE][DOC] update kmeans example	Zheng RuiFeng	2016-05-11	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Python example for ml.kmeans already exists, but not included in user guide. 1,small changes like: `example_on` `example_off` 2,add it to user guide 3,update examples to directly read datafile ## How was this patch tested? manual tests `./bin/spark-submit examples/src/main/python/ml/kmeans_example.py Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #12925 from zhengruifeng/km_pe.
*	[SPARK-14340][EXAMPLE][DOC] Update Examples and User Guide for ↵	Zheng RuiFeng	2016-05-11	1	-1/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ml.BisectingKMeans ## What changes were proposed in this pull request? 1, add BisectingKMeans to ml-clustering.md 2, add the missing Scala BisectingKMeansExample 3, create a new datafile `data/mllib/sample_kmeans_data.txt` ## How was this patch tested? manual tests Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #11844 from zhengruifeng/doc_bkm.
*	[SPARK-15141][EXAMPLE][DOC] Update OneVsRest Examples	Zheng RuiFeng	2016-05-11	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? 1, Add python example for OneVsRest 2, remove args-parsing ## How was this patch tested? manual tests `./bin/spark-submit examples/src/main/python/ml/one_vs_rest_example.py` Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #12920 from zhengruifeng/ovr_pe.
*	[SPARK-13382][DOCS][PYSPARK] Update pyspark testing notes in build docs	Holden Karau	2016-05-10	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? The current build documents don't specify that for PySpark tests we need to include Hive in the assembly otherwise the ORC tests fail. ## How was the this patch tested? Manually built the docs locally. Ran the provided build command follow by the PySpark SQL tests. ![pyspark2](https://cloud.githubusercontent.com/assets/59893/13190008/8829cde4-d70f-11e5-8ff5-a88b7894d2ad.png) Author: Holden Karau <holden@us.ibm.com> Closes #11278 from holdenk/SPARK-13382-update-pyspark-testing-notes-r2.
*	[SPARK-15223][DOCS] fix wrongly named config reference	Philipp Hoffmann	2016-05-09	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? The configuration setting `spark.executor.logs.rolling.size.maxBytes` was changed to `spark.executor.logs.rolling.maxSize` in 1.4 or so. This commit fixes a remaining reference to the old name in the documentation. Also the description for `spark.executor.logs.rolling.maxSize` was edited to clearly state that the unit for the size is bytes. ## How was this patch tested? no tests Author: Philipp Hoffmann <mail@philipphoffmann.de> Closes #13001 from philipphoffmann/patch-3.
*	[MINOR] [SPARKR] Update data-manipulation.R to use native csv reader	Yanbo Liang	2016-05-09	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? * Since Spark has supported native csv reader, it does not necessary to use the third party ```spark-csv``` in ```examples/src/main/r/data-manipulation.R```. Meanwhile, remove all ```spark-csv``` usage in SparkR. * Running R applications through ```sparkR``` is not supported as of Spark 2.0, so we change to use ```./bin/spark-submit``` to run the example. ## How was this patch tested? Offline test. Author: Yanbo Liang <ybliang8@gmail.com> Closes #13005 from yanboliang/r-df-examples.
*	[DOC][MINOR] Fixed minor errors in feature.ml user guide doc	Bryan Cutler	2016-05-07	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Fixed some minor errors found when reviewing feature.ml user guide ## How was this patch tested? built docs locally Author: Bryan Cutler <cutlerb@gmail.com> Closes #12940 from BryanCutler/feature.ml-doc_fixes-DOCS-MINOR.
*	[SPARK-14512] [DOC] Add python example for QuantileDiscretizer	Zheng RuiFeng	2016-05-06	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Add the missing python example for QuantileDiscretizer ## How was this patch tested? manual tests Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #12281 from zhengruifeng/discret_pe.
*	[SPARK-14738][BUILD] Separate docker integration tests from main build	Luciano Resende	2016-05-06	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Create a maven profile for executing the docker integration tests using maven Remove docker integration tests from main sbt build Update documentation on how to run docker integration tests from sbt ## How was this patch tested? Manual test of the docker integration tests as in : mvn -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.11 compile test ## Other comments Note that the the DB2 Docker Tests are still disabled as there is a kernel version issue on the AMPLab Jenkins slaves and we would need to get them on the right level before enabling those tests. They do run ok locally with the updates from PR #12348 Author: Luciano Resende <lresende@apache.org> Closes #12508 from lresende/docker.
*	[SPARK-12299][CORE] Remove history serving functionality from Master	Bryan Cutler	2016-05-04	1	-5/+0
\| \| \| \| \| \| \| \| \| \|	Remove history server functionality from standalone Master. Previously, the Master process rebuilt a SparkUI once the application was completed which sometimes caused problems, such as OOM, when the application event log is large (see SPARK-6270). Keeping this functionality out of the Master will help to simplify the process and increase stability. Testing for this change included running core unit tests and manually running an application on a standalone cluster to verify that it completed successfully and that the Master UI functioned correctly. Also added 2 unit tests to verify killing an application and driver from MasterWebUI makes the correct request to the Master. Author: Bryan Cutler <cutlerb@gmail.com> Closes #10991 from BryanCutler/remove-history-master-SPARK-12299.
*	[SPARK-4224][CORE][YARN] Support group acls	Dhruve Ashar	2016-05-04	3	-8/+57
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Currently only a list of users can be specified for view and modify acls. This change enables a group of admins/devs/users to be provisioned for viewing and modifying Spark jobs. Changes Proposed in the fix Three new corresponding config entries have been added where the user can specify the groups to be given access. ``` spark.admin.acls.groups spark.modify.acls.groups spark.ui.view.acls.groups ``` New config entries were added because specifying the users and groups explicitly is a better and cleaner way compared to specifying them in the existing config entry using a delimiter. A generic trait has been introduced to provide the user to group mapping which makes it pluggable to support a variety of mapping protocols - similar to the one used in hadoop. A default unix shell based implementation has been provided. Custom user to group mapping protocol can be specified and configured by the entry ```spark.user.groups.mapping``` How the patch was Tested We ran different spark jobs setting the config entries in combinations of admin, modify and ui acls. For modify acls we tried killing the job stages from the ui and using yarn commands. For view acls we tried accessing the UI tabs and the logs. Headless accounts were used to launch these jobs and different users tried to modify and view the jobs to ensure that the groups mapping applied correctly. Additional Unit tests have been added without modifying the existing ones. These test for different ways of setting the acls through configuration and/or API and validate the expected behavior. Author: Dhruve Ashar <dhruveashar@gmail.com> Closes #12760 from dhruve/impr/SPARK-4224.
*	[MINOR][DOC] Fixed some python snippets in mllib data types documentation.	Shuai Lin	2016-05-03	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Some python snippets is using scala imports and comments. ## How was this patch tested? Generated the docs locally with `SKIP_API=1 jekyll build` and viewed the changes in the browser. Author: Shuai Lin <linshuai2012@gmail.com> Closes #12869 from lins05/fix-mllib-python-snippets.
*	[MINOR][DOCS] Fix type Information in Quick Start and Programming Guide	Sandeep Singh	2016-05-03	2	-5/+5
\| \| \| \| \| \|	Author: Sandeep Singh <sandeep@techaddict.me> Closes #12841 from techaddict/improve_docs_1.