spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[DOC][SQL] update out-of-date code snippets using SQLContext in all documents.	WeichenXu	2016-07-06	2	-20/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? I search the whole documents directory using SQLContext, and update the following places: - docs/configuration.md, sparkR code snippets. - docs/streaming-programming-guide.md, several example code. ## How was this patch tested? N/A Author: WeichenXu <WeichenXu123@outlook.com> Closes #14025 from WeichenXu123/WIP_SQLContext_update.
*	[MINOR][DOCS] Remove unused images; crush PNGs that could use it for good ↵	Sean Owen	2016-07-04	25	-0/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	measure ## What changes were proposed in this pull request? Coincidentally, I discovered that a couple images were unused in `docs/`, and then searched and found more, and then realized some PNGs were pretty big and could be crushed, and before I knew it, had done the same for the ASF site (not committed yet). No functional change at all, just less superfluous image data. ## How was this patch tested? `jekyll serve` Author: Sean Owen <sowen@cloudera.com> Closes #14029 from srowen/RemoveCompressImages.
*	[SPARK-16345][DOCUMENTATION][EXAMPLES][GRAPHX] Extract graphx programming ↵	WeichenXu	2016-07-02	1	-127/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	guide example snippets from source files instead of hard code them ## What changes were proposed in this pull request? I extract 6 example programs from GraphX programming guide and replace them with `include_example` label. The 6 example programs are: - AggregateMessagesExample.scala - SSSPExample.scala - TriangleCountingExample.scala - ConnectedComponentsExample.scala - ComprehensiveExample.scala - PageRankExample.scala All the example code can run using `bin/run-example graphx.EXAMPLE_NAME` ## How was this patch tested? Manual. Author: WeichenXu <WeichenXu123@outlook.com> Closes #14015 from WeichenXu123/graphx_example_plugin.
*	[GRAPHX][EXAMPLES] move graphx test data directory and update graphx document	WeichenXu	2016-07-02	1	-9/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? There are two test data files used for graphx examples existing in directory "graphx/data" I move it into "data/" directory because the "graphx" directory is used for code files and other test data files (such as mllib, streaming test data) are all in there. I also update the graphx document where reference the data files which I move place. ## How was this patch tested? N/A Author: WeichenXu <WeichenXu123@outlook.com> Closes #14010 from WeichenXu123/move_graphx_data_dir.
*	[SPARK-15643][DOC][ML] Add breaking changes to ML migration guide	Nick Pentreath	2016-06-30	1	-3/+101
\| \| \| \| \| \| \| \| \| \| \| \|	This PR adds the breaking changes from [SPARK-14810](https://issues.apache.org/jira/browse/SPARK-14810) to the migration guide. ## How was this patch tested? Built docs locally. Author: Nick Pentreath <nickp@za.ibm.com> Closes #13924 from MLnick/SPARK-15643-migration-guide.
*	[SPARK-16256][DOCS] Fix window operation diagram	Tathagata Das	2016-06-30	4	-1/+1
\| \| \| \| \| \|	Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #14001 from tdas/SPARK-16256-2.
*	[SPARK-16256][DOCS] Minor fixes on the Structured Streaming Programming Guide	Tathagata Das	2016-06-29	1	-21/+23
\| \| \| \| \| \|	Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #13978 from tdas/SPARK-16256-1.
*	[SPARK-16294][SQL] Labelling support for the include_example Jekyll plugin	Cheng Lian	2016-06-29	2	-41/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This PR adds labelling support for the `include_example` Jekyll plugin, so that we may split a single source file into multiple line blocks with different labels, and include them in multiple code snippets in the generated HTML page. ## How was this patch tested? Manually tested. <img width="923" alt="screenshot at jun 29 19-53-21" src="https://cloud.githubusercontent.com/assets/230655/16451099/66a76db2-3e33-11e6-84fb-63104c2f0688.png"> Author: Cheng Lian <lian@databricks.com> Closes #13972 from liancheng/include-example-with-labels.
*	[SPARK-16256][SQL][STREAMING] Added Structured Streaming Programming Guide	Tathagata Das	2016-06-29	8	-0/+1157
\| \| \| \| \| \| \| \|	Title defines all. Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #13945 from tdas/SPARK-16256.
*	[SPARK-15990][YARN] Add rolling log aggregation support for Spark on yarn	jerryshao	2016-06-29	1	-0/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Yarn supports rolling log aggregation since 2.6, previously log will only be aggregated to HDFS after application is finished, it is quite painful for long running applications like Spark Streaming, thriftserver. Also out of disk problem will be occurred when log file is too large. So here propose to add support of rolling log aggregation for Spark on yarn. One limitation for this is that log4j should be set to change to file appender, now in Spark itself uses console appender by default, in which file will not be created again once removed after aggregation. But I think lots of production users should have changed their log4j configuration instead of default on, so this is not a big problem. ## How was this patch tested? Manually verified with Hadoop 2.7.1. Author: jerryshao <sshao@hortonworks.com> Closes #13712 from jerryshao/SPARK-15990.
*	[SPARK-15643][DOC][ML] Update spark.ml and spark.mllib migration guide from ↵	Yanbo Liang	2016-06-28	2	-19/+68
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	1.6 to 2.0 ## What changes were proposed in this pull request? Update ```spark.ml``` and ```spark.mllib``` migration guide from 1.6 to 2.0. ## How was this patch tested? Docs update, no tests. Author: Yanbo Liang <ybliang8@gmail.com> Closes #13378 from yanboliang/spark-13448.
*	[SPARK-15863][SQL][DOC][FOLLOW-UP] Update SQL programming guide.	Yin Huai	2016-06-27	1	-18/+16
\| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This PR makes several updates to SQL programming guide. Author: Yin Huai <yhuai@databricks.com> Closes #13938 from yhuai/doc.
*	[SPARK-15997][DOC][ML] Update user guide for HashingTF, QuantileVectorizer ↵	GayathriMurali	2016-06-24	1	-12/+17
\| \| \| \| \| \| \| \| \| \| \| \|	and CountVectorizer ## What changes were proposed in this pull request? Made changes to HashingTF,QuantileVectorizer and CountVectorizer Author: GayathriMurali <gayathri.m@intel.com> Closes #13745 from GayathriMurali/SPARK-15997.
*	[SPARK-13723][YARN] Change behavior of --num-executors with dynamic allocation.	Ryan Blue	2016-06-23	2	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This changes the behavior of --num-executors and spark.executor.instances when using dynamic allocation. Instead of turning dynamic allocation off, it uses the value for the initial number of executors. This changes was discussed on [SPARK-13723](https://issues.apache.org/jira/browse/SPARK-13723). I highly recommend using it while we can change the behavior for 2.0.0. In practice, the 1.x behavior causes unexpected behavior for users (it is not clear that it disables dynamic allocation) and wastes cluster resources because users rarely notice the log message. ## How was this patch tested? This patch updates tests and adds a test for Utils.getDynamicAllocationInitialExecutors. Author: Ryan Blue <blue@apache.org> Closes #13338 from rdblue/SPARK-13723-num-executors-with-dynamic-allocation.
*	[SPARK-16088][SPARKR] update setJobGroup, cancelJobGroup, clearJobGroup	Felix Cheung	2016-06-23	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Updated setJobGroup, cancelJobGroup, clearJobGroup to not require sc/SparkContext as parameter. Also updated roxygen2 doc and R programming guide on deprecations. ## How was this patch tested? unit tests Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #13838 from felixcheung/rjobgroup.
*	[SPARK-15672][R][DOC] R programming guide update	Kai Jiang	2016-06-22	1	-0/+77
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Guide for - UDFs with dapply, dapplyCollect - spark.lapply for running parallel R functions ## How was this patch tested? build locally <img width="654" alt="screen shot 2016-06-14 at 03 12 56" src="https://cloud.githubusercontent.com/assets/3419881/16039344/12a3b6a0-31de-11e6-8d77-fe23308075c0.png"> Author: Kai Jiang <jiangkai@gmail.com> Closes #13660 from vectorijk/spark-15672-R-guide-update.
*	[SQL][DOC] SQL programming guide add deprecated methods in 2.0.0	Felix Cheung	2016-06-22	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Doc changes ## How was this patch tested? manual liancheng Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #13827 from felixcheung/sqldocdeprecate.
*	[SPARK-16045][ML][DOC] Spark 2.0 ML.feature: doc update for stopwords and ↵	Yuhao Yang	2016-06-21	1	-6/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	binarizer ## What changes were proposed in this pull request? jira: https://issues.apache.org/jira/browse/SPARK-16045 2.0 Audit: Update document for StopWordsRemover and Binarizer. ## How was this patch tested? manual review for doc Author: Yuhao Yang <hhbyyh@gmail.com> Author: Yuhao Yang <yuhao.yang@intel.com> Closes #13375 from hhbyyh/stopdoc.
*	[SPARK-15894][SQL][DOC] Update docs for controlling #partitions	Takeshi YAMAMURO	2016-06-21	1	-0/+17
\| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Update docs for two parameters `spark.sql.files.maxPartitionBytes` and `spark.sql.files.openCostInBytes ` in Other Configuration Options. ## How was this patch tested? N/A Author: Takeshi YAMAMURO <linguin.m.s@gmail.com> Closes #13797 from maropu/SPARK-15894-2.
*	[SPARK-15863][SQL][DOC][SPARKR] sql programming guide updates to include ↵	Felix Cheung	2016-06-21	2	-19/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	sparkSession in R ## What changes were proposed in this pull request? Update doc as per discussion in PR #13592 ## How was this patch tested? manual shivaram liancheng Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #13799 from felixcheung/rsqlprogrammingguide.
*	[SPARK-16025][CORE] Document OFF_HEAP storage level in 2.0	Eric Liang	2016-06-20	1	-0/+5
\| \| \| \| \| \| \| \|	This has changed from 1.6, and now stores memory off-heap using spark's off-heap support instead of in tachyon. Author: Eric Liang <ekl@databricks.com> Closes #13744 from ericl/spark-16025.
*	[SPARK-15863][SQL][DOC] Initial SQL programming guide update for Spark 2.0	Cheng Lian	2016-06-20	1	-288/+317
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Initial SQL programming guide update for Spark 2.0. Contents like 1.6 to 2.0 migration guide are still incomplete. We may also want to add more examples for Scala/Java Dataset typed transformations. ## How was this patch tested? N/A Author: Cheng Lian <lian@databricks.com> Closes #13592 from liancheng/sql-programming-guide-2.0.
*	[SPARK-15159][SPARKR] SparkSession roxygen2 doc, programming guide, example ↵	Felix Cheung	2016-06-20	1	-51/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	updates ## What changes were proposed in this pull request? roxygen2 doc, programming guide, example updates ## How was this patch tested? manual checks shivaram Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #13751 from felixcheung/rsparksessiondoc.
*	[SPARK-16040][MLLIB][DOC] spark.mllib PIC document extra line of refernece	wm624@hotmail.com	2016-06-19	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? In the 2.0 document, Line "A full example that produces the experiment described in the PIC paper can be found under examples/." is redundant. There is already "Find full example code at "examples/src/main/scala/org/apache/spark/examples/mllib/PowerIterationClusteringExample.scala" in the Spark repo.". We should remove the first line, which is consistent with other documents. ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) Manual test Author: wm624@hotmail.com <wm624@hotmail.com> Closes #13755 from wangmiao1981/doc.
*	[SPARK-15129][R][DOC] R API changes in ML	GayathriMurali	2016-06-17	1	-58/+19
\| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Make user guide changes to SparkR documentation for all changes that happened in 2.0 to Machine Learning APIs Author: GayathriMurali <gayathri.m@intel.com> Closes #13285 from GayathriMurali/SPARK-15129.
*	[SPARK-15966][DOC] Add closing tag to fix rendering issue for Spark monitoring	Dhruve Ashar	2016-06-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Adds the missing closing tag for spark.ui.view.acls.groups ## How was this patch tested? I built the docs locally and verified the changed in browser. (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Before: ![image](https://cloud.githubusercontent.com/assets/7732317/16135005/49fc0724-33e6-11e6-9390-98711593fa5b.png) After: ![image](https://cloud.githubusercontent.com/assets/7732317/16135021/62b5c4a8-33e6-11e6-8118-b22fda5c66eb.png) Author: Dhruve Ashar <dhruveashar@gmail.com> Closes #13719 from dhruve/doc/SPARK-15966.
*	[SPARK-15608][ML][EXAMPLES][DOC] add examples and documents of ml.isotonic ↵	WeichenXu	2016-06-16	1	-0/+70
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	regression ## What changes were proposed in this pull request? add ml doc for ml isotonic regression add scala example for ml isotonic regression add java example for ml isotonic regression add python example for ml isotonic regression modify scala example for mllib isotonic regression modify java example for mllib isotonic regression modify python example for mllib isotonic regression add data/mllib/sample_isotonic_regression_libsvm_data.txt delete data/mllib/sample_isotonic_regression_data.txt ## How was this patch tested? N/A Author: WeichenXu <WeichenXu123@outlook.com> Closes #13381 from WeichenXu123/add_isotonic_regression_doc.
*	[SPARK-15796][CORE] Reduce spark.memory.fraction default to avoid ↵	Sean Owen	2016-06-16	2	-4/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	overrunning old gen in JVM default config ## What changes were proposed in this pull request? Reduce `spark.memory.fraction` default to 0.6 in order to make it fit within default JVM old generation size (2/3 heap). See JIRA discussion. This means a full cache doesn't spill into the new gen. CC andrewor14 ## How was this patch tested? Jenkins tests. Author: Sean Owen <sowen@cloudera.com> Closes #13618 from srowen/SPARK-15796.
*	[SPARK-7848][STREAMING][UPDATE SPARKSTREAMING DOCS TO INCORPORATE IMPORTANT ↵	Nirman Narang	2016-06-15	1	-0/+19
\| \| \| \| \| \| \| \| \| \|	POINTS.] Updated the SparkStreaming Doc with some important points. Author: Nirman Narang <narang@us.ibm.com> Closes #11114 from nirmannarang/SPARK-7848.
*	[DOCUMENTATION] fixed typos in python programming guide	Mortada Mehyar	2016-06-14	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? minor typo ## How was this patch tested? minor typo in the doc, should be self explanatory Author: Mortada Mehyar <mortada.mehyar@gmail.com> Closes #13639 from mortada/typo.
*	[SPARK-15086][CORE][STREAMING] Deprecate old Java accumulator API	Sean Owen	2016-06-12	2	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? - Deprecate old Java accumulator API; should use Scala now - Update Java tests and examples - Don't bother testing old accumulator API in Java 8 (too) - (fix a misspelling too) ## How was this patch tested? Jenkins tests Author: Sean Owen <sowen@cloudera.com> Closes #13606 from srowen/SPARK-15086.
*	[SPARK-15806][DOCUMENTATION] update doc for SPARK_MASTER_IP	bomeng	2016-06-12	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? SPARK_MASTER_IP is a deprecated environment variable. It is replaced by SPARK_MASTER_HOST according to MasterArguments.scala. ## How was this patch tested? Manually verified. Author: bomeng <bmeng@us.ibm.com> Closes #13543 from bomeng/SPARK-15806.
*	[SPARK-15781][DOCUMENTATION] remove deprecated environment variable doc	bomeng	2016-06-12	1	-9/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Like `SPARK_JAVA_OPTS` and `SPARK_CLASSPATH`, we will remove the document for `SPARK_WORKER_INSTANCES` to discourage user not to use them. If they are actually used, SparkConf will show a warning message as before. ## How was this patch tested? Manually tested. Author: bomeng <bmeng@us.ibm.com> Closes #13533 from bomeng/SPARK-15781.
*	[SPARK-15883][MLLIB][DOCS] Fix broken links in mllib documents	Dongjoon Hyun	2016-06-11	8	-29/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This issue fixes all broken links on Spark 2.0 preview MLLib documents. Also, this contains some editorial change. Fix broken links * mllib-data-types.md * mllib-decision-tree.md * mllib-ensembles.md * mllib-feature-extraction.md * mllib-pmml-model-export.md * mllib-statistics.md Fix malformed section header and scala coding style * mllib-linear-methods.md Replace indirect forward links with direct one * ml-classification-regression.md ## How was this patch tested? Manual tests (with `cd docs; jekyll build`.) Author: Dongjoon Hyun <dongjoon@apache.org> Closes #13608 from dongjoon-hyun/SPARK-15883.
*	[SPARK-15879][DOCS][UI] Update logo in UI and docs to add "Apache"	Sean Owen	2016-06-11	5	-0/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Use new Spark logo including "Apache" (now, with crushed PNGs). Remove old unreferenced logo files. ## How was this patch tested? Manual check of generated HTML site and Spark UI. I searched for references to the deleted files to make sure they were not used. Author: Sean Owen <sowen@cloudera.com> Closes #13609 from srowen/SPARK-15879.
*	[DOCUMENTATION] fixed groupby aggregation example for pyspark	Mortada Mehyar	2016-06-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? fixing documentation for the groupby/agg example in python ## How was this patch tested? the existing example in the documentation dose not contain valid syntax (missing parenthesis) and is not using `Column` in the expression for `agg()` after the fix here's how I tested it: ``` In [1]: from pyspark.sql import Row In [2]: import pyspark.sql.functions as func In [3]: %cpaste Pasting code; enter '--' alone on the line to stop or use Ctrl-D. :records = [{'age': 19, 'department': 1, 'expense': 100}, : {'age': 20, 'department': 1, 'expense': 200}, : {'age': 21, 'department': 2, 'expense': 300}, : {'age': 22, 'department': 2, 'expense': 300}, : {'age': 23, 'department': 3, 'expense': 300}] :-- In [4]: df = sqlContext.createDataFrame([Row(**d) for d in records]) In [5]: df.groupBy("department").agg(df["department"], func.max("age"), func.sum("expense")).show() +----------+----------+--------+------------+ \|department\|department\|max(age)\|sum(expense)\| +----------+----------+--------+------------+ \| 1\| 1\| 20\| 300\| \| 2\| 2\| 22\| 600\| \| 3\| 3\| 23\| 300\| +----------+----------+--------+------------+ Author: Mortada Mehyar <mortada.mehyar@gmail.com> Closes #13587 from mortada/groupby_agg_doc_fix.
*	[DOCUMENTATION] Fixed target JAR path	prabs	2016-06-08	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Mentioned Scala version in the sbt configuration file is 2.11, so the path of the target JAR should be `/target/scala-2.11/simple-project_2.11-1.0.jar` ## How was this patch tested? n/a Author: prabs <prabsmails@gmail.com> Author: Prabeesh K <prabsmails@gmail.com> Closes #13554 from prabeesh/master.
*	[SPARK-13590][ML][DOC] Document spark.ml LiR, LoR and AFTSurvivalRegression ↵	Yanbo Liang	2016-06-07	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	behavior difference ## What changes were proposed in this pull request? When fitting ```LinearRegressionModel```(by "l-bfgs" solver) and ```LogisticRegressionModel``` w/o intercept on dataset with constant nonzero column, spark.ml produce same model as R glmnet but different from LIBSVM. When fitting ```AFTSurvivalRegressionModel``` w/o intercept on dataset with constant nonzero column, spark.ml produce different model compared with R survival::survreg. We should output a warning message and clarify in document for this condition. ## How was this patch tested? Document change, no unit test. cc mengxr Author: Yanbo Liang <ybliang8@gmail.com> Closes #12731 from yanboliang/spark-13590.
*	[SPARK-15760][DOCS] Add documentation for package-related configs.	Marcelo Vanzin	2016-06-07	1	-0/+47
\| \| \| \| \| \| \| \| \|	While there, also document spark.files and spark.jars. Text is the same as the spark-submit help text with some minor adjustments. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #13502 from vanzin/SPARK-15760.
*	[MINOR] fix typo in documents	WeichenXu	2016-06-07	3	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? I use spell check tools checks typo in spark documents and fix them. ## How was this patch tested? N/A Author: WeichenXu <WeichenXu123@outlook.com> Closes #13538 from WeichenXu123/fix_doc_typo.
*	[SPARK-15617][ML][DOC] Clarify that fMeasure in MulticlassMetrics is "micro" ↵	Ruifeng Zheng	2016-06-04	1	-13/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	f1_score ## What changes were proposed in this pull request? 1, del precision,recall in `ml.MulticlassClassificationEvaluator` 2, update user guide for `mlllib.weightedFMeasure` ## How was this patch tested? local build Author: Ruifeng Zheng <ruifengz@foxmail.com> Closes #13390 from zhengruifeng/clarify_f1.
*	[SPARK-15208][WIP][CORE][STREAMING][DOCS] Update Spark examples with ↵	Liwei Lin	2016-06-02	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	AccumulatorV2 ## What changes were proposed in this pull request? The patch updates the codes & docs in the example module as well as the related doc module: - [ ] [docs] `streaming-programming-guide.md` - [x] scala code part - [ ] java code part - [ ] python code part - [x] [examples] `RecoverableNetworkWordCount.scala` - [ ] [examples] `JavaRecoverableNetworkWordCount.java` - [ ] [examples] `recoverable_network_wordcount.py` ## How was this patch tested? Ran the examples and verified results manually. Author: Liwei Lin <lwlin7@gmail.com> Closes #12981 from lw-lin/accumulatorV2-examples.
*	[SPARK-15702][DOCUMENTATION] Update document programming-guide accumulator ↵	WeichenXu	2016-06-01	1	-18/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	section ## What changes were proposed in this pull request? Update document programming-guide accumulator section (scala language) java and python version, because the API haven't done, so I do not modify them. ## How was this patch tested? N/A Author: WeichenXu <WeichenXu123@outlook.com> Closes #13441 from WeichenXu123/update_doc_accumulatorV2_clean.
*	[DOCS] fix example code issues in documentation	Matthew Wise	2016-05-30	2	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Fixed broken java code examples in streaming documentation Attn: tdas Author: Matthew Wise <matthew.rs.wise@gmail.com> Closes #13388 from mawise/fix_docs_java_streaming_example.
*	[SPARK-11959][SPARK-15484][DOC][ML] Document WLS and IRLS	Yanbo Liang	2016-05-27	1	-5/+80
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? * Document ```WeightedLeastSquares```(normal equation) and ```IterativelyReweightedLeastSquares```. * Copy ```L-BFGS``` documents from ```spark.mllib``` to ```spark.ml```. Due to the session ```Optimization of linear methods``` is used for developers, I think we should provide the brief introduction of the optimization method, necessary references and how it implements in Spark. It's not necessary to paste all mathematical formula and derivation here. If developers/users want to learn more, they can track reference. ## How was this patch tested? Document update, no tests. Author: Yanbo Liang <ybliang8@gmail.com> Closes #13262 from yanboliang/spark-15484.
*	[SPARK-15186][ML][DOCS] Add user guide for generalized linear regression	sethah	2016-05-27	1	-0/+132
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This patch adds a user guide section for generalized linear regression and includes the examples from [#12754](https://github.com/apache/spark/pull/12754). ## How was this patch tested? Documentation only, no tests required. ## Approach In general, it is a bit unclear what level of detail ought to be included in the user guide since there is a lot of variability within the current user guide. I tried to give a fairly brief mathematical introduction to GLMs, and cover what types of problems they could be used for. Additionally, I included a brief blurb on the IRLS solver. The input/output columns are given in a table as is found elsewhere in the docs (though, again, these appear rather intermittently in the current docs), as well as a table providing the supported families and their link functions. Author: sethah <seth.hendrickson16@gmail.com> Closes #13139 from sethah/SPARK-15186.
*	[YARN][DOC][MINOR] Remove several obsolete env variables and update the doc	jerryshao	2016-05-27	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Remove several obsolete env variables not supported for Spark on YARN now, also updates the docs to include several changes with 2.0. ## How was this patch tested? N/A CC vanzin tgravescs Author: jerryshao <sshao@hortonworks.com> Closes #13296 from jerryshao/yarn-doc.
*	[MINOR] Fix Typos 'a -> an'	Zheng RuiFeng	2016-05-26	2	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? `a` -> `an` I use regex to generate potential error lines: `grep -in ' a [aeiou]' mllib/src/main/scala/org/apache/spark/ml//scala` and review them line by line. ## How was this patch tested? local build `lint-java` checking Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #13317 from zhengruifeng/a_an.
*	[SPARK-10903] followup - update API doc for SqlContext	felixcheung	2016-05-26	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Follow up on the earlier PR - in here we are fixing up roxygen2 doc examples. Also add to the programming guide migration section. ## How was this patch tested? SparkR tests Author: felixcheung <felixcheung_m@hotmail.com> Closes #13340 from felixcheung/sqlcontextdoc.
*	[SPARK-13148][YARN] document zero-keytab Oozie application launch; add ↵	Steve Loughran	2016-05-26	1	-0/+96
\| \| \| \| \| \| \| \| \| \| \|	diagnostics This patch provides detail on what to do for keytabless Oozie launches of spark apps, and adds some debug-level diagnostics of what credentials have been submitted Author: Steve Loughran <stevel@hortonworks.com> Author: Steve Loughran <stevel@apache.org> Closes #11033 from steveloughran/stevel/feature/SPARK-13148-oozie.