spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	[SPARK-9570] [DOCS] Consistent recommendation for submitting spark apps to ↵	Sean Owen	2015-10-04	3	-26/+33
\| \| \| \| \| \| \| \| \| \| \| \|	YARN, -master yarn --deploy-mode x vs -master yarn-x'. Recommend `--master yarn --deploy-mode {cluster,client}` consistently in docs. Follow-on to https://github.com/apache/spark/pull/8385 CC nssalian Author: Sean Owen <sowen@cloudera.com> Closes #8968 from srowen/SPARK-9570.
*	[SPARK-10670] [ML] [Doc] add api reference for ml doc	Yuhao Yang	2015-09-28	1	-64/+195
\| \| \| \| \| \| \| \| \| \|	jira: https://issues.apache.org/jira/browse/SPARK-10670 In the Markdown docs for the spark.ml Programming Guide, we have code examples with codetabs for each language. We should link to each language's API docs within the corresponding codetab, but we are inconsistent about this. For an example of what we want to do, see the "Word2Vec" section in https://github.com/apache/spark/blob/64743870f23bffb8d96dcc8a0181c1452782a151/docs/ml-features.md This JIRA is just for spark.ml, not spark.mllib Author: Yuhao Yang <hhbyyh@gmail.com> Closes #8901 from hhbyyh/docAPI.
*	Fix two mistakes in programming-guide page	David Martin	2015-09-28	1	-2/+2
\| \| \| \| \| \| \| \| \|	seperate -> separate sees -> see Author: David Martin <dmartinpro@users.noreply.github.com> Closes #8928 from dmartinpro/patch-1.
*	add doc for spark.streaming.stopGracefullyOnShutdown	Bin Wang	2015-09-27	1	-0/+8
\| \| \| \| \| \|	Author: Bin Wang <wbin00@gmail.com> Closes #8898 from wb14123/doc.
*	[SPARK-10663] Removed unnecessary invocation of DataFrame.toDF method.	Matt Hagen	2015-09-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	The Scala example under the "Example: Pipeline" heading in this document initializes the "test" variable to a DataFrame. Because test is already a DF, there is not need to call test.toDF as the example does in a subsequent line: model.transform(test.toDF). So, I removed the extraneous toDF invocation. Author: Matt Hagen <anonz3000@gmail.com> Closes #8875 from hagenhaus/SPARK-10663.
*	[SPARK-10695] [DOCUMENTATION] [MESOS] Fixing incorrect value informati…	Akash Mishra	2015-09-22	1	-2/+2
\| \| \| \| \| \| \| \|	…on for spark.mesos.constraints parameter. Author: Akash Mishra <akash.mishra20@gmail.com> Closes #8816 from SleepyThread/constraint-fix.
*	[SPARK-10676] [DOCS] Add documentation for SASL encryption options.	Marcelo Vanzin	2015-09-21	2	-2/+36
\| \| \| \| \| \|	Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #8803 from vanzin/SPARK-10676.
*	[SPARK-10662] [DOCS] Code snippets are not properly formatted in tables	Jacek Laskowski	2015-09-21	6	-170/+171
\| \| \| \| \| \| \| \| \| \|	* Backticks are processed properly in Spark Properties table * Removed unnecessary spaces * See http://people.apache.org/~pwendell/spark-nightly/spark-master-docs/latest/running-on-yarn.html Author: Jacek Laskowski <jacek.laskowski@deepsense.io> Closes #8795 from jaceklaskowski/docs-yarn-formatting.
*	[SPARK-10710] Remove ability to disable spilling in core and SQL	Josh Rosen	2015-09-19	2	-18/+3
\| \| \| \| \| \| \| \| \| \|	It does not make much sense to set `spark.shuffle.spill` or `spark.sql.planner.externalSort` to false: I believe that these configurations were initially added as "escape hatches" to guard against bugs in the external operators, but these operators are now mature and well-tested. In addition, these configurations are not handled in a consistent way anymore: SQL's Tungsten codepath ignores these configurations and will continue to use spilling operators. Similarly, Spark Core's `tungsten-sort` shuffle manager does not respect `spark.shuffle.spill=false`. This pull request removes these configurations, adds warnings at the appropriate places, and deletes a large amount of code which was only used in code paths that did not support spilling. Author: Josh Rosen <joshrosen@databricks.com> Closes #8831 from JoshRosen/remove-ability-to-disable-spilling.
*	Fixed links to the API	Alexis Seigneurin	2015-09-19	1	-4/+4
\| \| \| \| \| \| \| \|	Submitting this change on the master branch as requested in https://github.com/apache/spark/pull/8819#issuecomment-141505941 Author: Alexis Seigneurin <alexis.seigneurin@gmail.com> Closes #8838 from aseigneurin/patch-2.
*	[SPARK-10584] [SQL] [DOC] Documentation about the compatible Hive version is ↵	Kousuke Saruta	2015-09-19	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \|	wrong. In Spark 1.5.0, Spark SQL is compatible with Hive 0.12.0 through 1.2.1 but the documentation is wrong. /CC yhuai Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #8776 from sarutak/SPARK-10584-2.
*	[SPARK-9808] Remove hash shuffle file consolidation.	Reynold Xin	2015-09-18	1	-10/+0
\| \| \| \| \| \|	Author: Reynold Xin <rxin@databricks.com> Closes #8812 from rxin/SPARK-9808-1.
*	Added <code> tag to documentation.	Reynold Xin	2015-09-17	1	-1/+1
\|
*	docs/running-on-mesos.md: state default values in default column	Felix Bechstein	2015-09-17	1	-4/+4
\| \| \| \| \| \| \| \|	This PR simply uses the default value column for defaults. Author: Felix Bechstein <felix.bechstein@otto.de> Closes #8810 from felixb/fix_mesos_doc.
*	[SPARK-10650] Clean before building docs	Michael Armbrust	2015-09-17	1	-2/+5
\| \| \| \| \| \| \| \|	The [published docs for 1.5.0](http://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/streaming/) have a bunch of test classes in them. The only way I can reproduce this is to `test:compile` before running `unidoc`. To prevent this from happening again, I've added a clean before doc generation. Author: Michael Armbrust <michael@databricks.com> Closes #8787 from marmbrus/testsInDocs.
*	[SPARK-10660] Doc describe error in the "Running Spark on YARN" page	yangping.wu	2015-09-17	1	-2/+2
\| \| \| \| \| \| \| \|	In the Configuration section, the spark.yarn.driver.memoryOverhead and spark.yarn.am.memoryOverhead‘s default value should be "driverMemory * 0.10, with minimum of 384" and "AM memory * 0.10, with minimum of 384" respectively. Because from Spark 1.4.0, the MEMORY_OVERHEAD_FACTOR is set to 0.1.0, not 0.07. Author: yangping.wu <wyphao.2007@163.com> Closes #8797 from 397090770/SparkOnYarnDocError.
*	[SPARK-10595] [ML] [MLLIB] [DOCS] Various ML guide cleanups	Joseph K. Bradley	2015-09-15	5	-35/+91
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Various ML guide cleanups. * ml-guide.md: Make it easier to access the algorithm-specific guides. * LDA user guide: EM often begins with useless topics, but running longer generally improves them dramatically. E.g., 10 iterations on a Wikipedia dataset produces useless topics, but 50 iterations produces very meaningful topics. * mllib-feature-extraction.html#elementwiseproduct: “w” parameter should be “scalingVec” * Clean up Binarizer user guide a little. * Document in Pipeline that users should not put an instance into the Pipeline in more than 1 place. * spark.ml Word2Vec user guide: clean up grammar/writing * Chi Sq Feature Selector docs: Improve text in doc. CC: mengxr feynmanliang Author: Joseph K. Bradley <joseph@databricks.com> Closes #8752 from jkbradley/mlguide-fixes-1.5.
*	[DOCS] Small fixes to Spark on Yarn doc	Jacek Laskowski	2015-09-15	1	-6/+6
\| \| \| \| \| \| \| \| \|	* a follow-up to 16b6d18613e150c7038c613992d80a7828413e66 as `--num-executors` flag is not suppported. * links + formatting Author: Jacek Laskowski <jacek.laskowski@deepsense.io> Closes #8762 from jaceklaskowski/docs-spark-on-yarn.
*	Update version to 1.6.0-SNAPSHOT.	Reynold Xin	2015-09-15	1	-2/+2
\| \| \| \| \| \|	Author: Reynold Xin <rxin@databricks.com> Closes #8350 from rxin/1.6.
*	Small fixes to docs	Jacek Laskowski	2015-09-14	1	-5/+5
\| \| \| \| \| \| \| \|	Links work now properly + consistent use of Spark standalone cluster (Spark uppercase + lowercase the rest -- seems agreed in the other places in the docs). Author: Jacek Laskowski <jacek.laskowski@deepsense.io> Closes #8759 from jaceklaskowski/docs-submitting-apps.
*	[SPARK-10584] [DOC] [SQL] Documentation about ↵	Kousuke Saruta	2015-09-14	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	spark.sql.hive.metastore.version is wrong. The default value of hive metastore version is 1.2.1 but the documentation says the value of `spark.sql.hive.metastore.version` is 0.13.1. Also, we cannot get the default value by `sqlContext.getConf("spark.sql.hive.metastore.version")`. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #8739 from sarutak/SPARK-10584.
*	[SPARK-10222] [GRAPHX] [DOCS] More thoroughly deprecate Bagel in favor of GraphX	Sean Owen	2015-09-13	2	-10/+1
\| \| \| \| \| \| \| \|	Finish deprecating Bagel; remove reference to nonexistent example Author: Sean Owen <sowen@cloudera.com> Closes #8731 from srowen/SPARK-10222.
*	[SPARK-10518] [DOCS] Update code examples in spark.ml user guide to use ↵	y-shimizu	2015-09-11	3	-104/+47
\| \| \| \| \| \| \| \| \| \|	LIBSVM data source instead of MLUtils I fixed to use LIBSVM data source in the example code in spark.ml instead of MLUtils Author: y-shimizu <y.shimizu0429@gmail.com> Closes #8697 from y-shimizu/SPARK-10518.
*	[SPARK-10514] [MESOS] waiting for min no of total cores acquired by Spark by ↵	Akash Mishra	2015-09-10	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	implementing the sufficientResourcesRegistered method spark.scheduler.minRegisteredResourcesRatio configuration parameter works for YARN mode but not for Mesos Coarse grained mode. If the parameter specified default value of 0 will be set for spark.scheduler.minRegisteredResourcesRatio in base class and this method will always return true. There are no existing test for YARN mode too. Hence not added test for the same. Author: Akash Mishra <akash.mishra20@gmail.com> Closes #8672 from SleepyThread/master.
*	[SPARK-10469] [DOC] Try and document the three options	Holden Karau	2015-09-10	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	From JIRA: Add documentation for tungsten-sort. From the mailing list "I saw a new "spark.shuffle.manager=tungsten-sort" implemented in https://issues.apache.org/jira/browse/SPARK-7081, but it can't be found its corresponding description in http://people.apache.org/~pwendell/spark-releases/spark-1.5.0-rc3-docs/configuration.html(Currenlty there are only 'sort' and 'hash' two options)." Author: Holden Karau <holden@pigscanfly.ca> Closes #8638 from holdenk/SPARK-10469-document-tungsten-sort.
*	[MINOR] [MLLIB] [ML] [DOC] fixed typo: label for negative result should be ↵	Sean Paradiso	2015-09-09	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	0.0 (original: 1.0) Small typo in the example for `LabelledPoint` in the MLLib docs. Author: Sean Paradiso <seanparadiso@gmail.com> Closes #8680 from sparadiso/docs_mllib_smalltypo.
*	[SPARK-10249] [ML] [DOC] Add Python Code Example to StopWordsRemover User Guide	Yuhao Yang	2015-09-08	1	-0/+19
\| \| \| \| \| \| \| \| \| \|	jira: https://issues.apache.org/jira/browse/SPARK-10249 update user guide since python support added. Author: Yuhao Yang <hhbyyh@gmail.com> Closes #8620 from hhbyyh/swPyDocExample.
*	[SPARK-10492] [STREAMING] [DOCUMENTATION] Update Streaming documentation ↵	Tathagata Das	2015-09-08	2	-1/+25
\| \| \| \| \| \| \| \| \| \|	about rate limiting and backpressure Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #8656 from tdas/SPARK-10492 and squashes the following commits: 986cdd6 [Tathagata Das] Added information on backpressure
*	Docs small fixes	Jacek Laskowski	2015-09-08	2	-19/+19
\| \| \| \| \| \|	Author: Jacek Laskowski <jacek@japila.pl> Closes #8629 from jaceklaskowski/docs-fixes.
*	[DOC] Added R to the list of languages with "high-level API" support in the…	Stephen Hopper	2015-09-08	1	-8/+8
\| \| \| \| \| \| \| \|	… main README. Author: Stephen Hopper <shopper@shopper-osx.local> Closes #8646 from enragedginger/master.
*	[SPARK-9767] Remove ConnectionManager.	Reynold Xin	2015-09-07	1	-11/+0
\| \| \| \| \| \| \| \|	We introduced the Netty network module for shuffle in Spark 1.2, and has turned it on by default for 3 releases. The old ConnectionManager is difficult to maintain. If we merge the patch now, by the time it is released, it would be 1 yr for which ConnectionManager is off by default. It's time to remove it. Author: Reynold Xin <rxin@databricks.com> Closes #8161 from rxin/SPARK-9767.
*	[SPARK-10440] [STREAMING] [DOCS] Update python API stuff in the programming ↵	Tathagata Das	2015-09-04	2	-12/+4
\| \| \| \| \| \| \| \| \| \| \|	guides and python docs - Fixed information around Python API tags in streaming programming guides - Added missing stuff in python docs Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #8595 from tdas/SPARK-10440.
*	[SPARK-9669] [MESOS] Support PySpark on Mesos cluster mode.	Timothy Chen	2015-09-04	1	-0/+2
\| \| \| \| \| \| \| \| \|	Support running pyspark with cluster mode on Mesos! This doesn't upload any scripts, so if running in a remote Mesos requires the user to specify the script from a available URI. Author: Timothy Chen <tnachen@gmail.com> Closes #8349 from tnachen/mesos_python.
*	[SPARK-10432] spark.port.maxRetries documentation is unclear	Tom Graves	2015-09-03	1	-1/+5
\| \| \| \| \| \|	Author: Tom Graves <tgraves@yahoo-inc.com> Closes #8585 from tgravescs/SPARK-10432.
*	[SPARK-4223] [CORE] Support * in acls.	zhuol	2015-09-01	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SPARK-4223. Currently we support setting view and modify acls but you have to specify a list of users. It would be nice to support * meaning all users have access. Manual tests to verify that: "*" works for any user in: a. Spark ui: view and kill stage. Done. b. Spark history server. Done. c. Yarn application killing. Done. Author: zhuol <zhuol@yahoo-inc.com> Closes #8398 from zhuoliu/4223.
*	[SPARK-10398] [DOCS] Migrate Spark download page to use new lua mirroring ↵	Sean Owen	2015-09-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	scripts Migrate Apache download closer.cgi refs to new closer.lua This is the bit of the change that affects the project docs; I'm implementing the changes to the Apache site separately. Author: Sean Owen <sowen@cloudera.com> Closes #8557 from srowen/SPARK-10398.
*	[SPARK-10331] [MLLIB] Update example code in ml-guide	Xiangrui Meng	2015-08-29	1	-215/+147
\| \| \| \| \| \| \| \| \| \| \| \|	* The example code was added in 1.2, before `createDataFrame`. This PR switches to `createDataFrame`. Java code still uses JavaBean. * assume `sqlContext` is available * fix some minor issues from previous code review jkbradley srowen feynmanliang Author: Xiangrui Meng <meng@databricks.com> Closes #8518 from mengxr/SPARK-10331.
*	[SPARK-10348] [MLLIB] updates ml-guide	Xiangrui Meng	2015-08-29	2	-52/+78
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	* replace `ML Dataset` by `DataFrame` to unify the abstraction * ML algorithms -> pipeline components to describe the main concept * remove Scala API doc links from the main guide * `Section Title` -> `Section tile` to be consistent with other section titles in MLlib guide * modified lines break at 100 chars or periods jkbradley feynmanliang Author: Xiangrui Meng <meng@databricks.com> Closes #8517 from mengxr/SPARK-10348.
*	[SPARK-10350] [DOC] [SQL] Removed duplicated option description from SQL guide	GuoQiang Li	2015-08-29	1	-10/+0
\| \| \| \| \| \|	Author: GuoQiang Li <witgo@qq.com> Closes #8520 from witgo/SPARK-10350.
*	[SPARK-9910] [ML] User guide for train validation split	martinzapletal	2015-08-28	1	-0/+117
\| \| \| \| \| \|	Author: martinzapletal <zapletal-martin@email.cz> Closes #8377 from zapletal-martin/SPARK-9910.
*	[SPARK-9671] [MLLIB] re-org user guide and add migration guide	Xiangrui Meng	2015-08-28	3	-106/+95
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This PR updates the MLlib user guide and adds migration guide for 1.4->1.5. * merge migration guide for `spark.mllib` and `spark.ml` packages * remove dependency section from `spark.ml` guide * move the paragraph about `spark.mllib` and `spark.ml` to the top and recommend `spark.ml` * move Sam's talk to footnote to make the section focus on dependencies Minor changes to code examples and other wording will be in a separate PR. jkbradley srowen feynmanliang Author: Xiangrui Meng <meng@databricks.com> Closes #8498 from mengxr/SPARK-9671.
*	[SPARK-9890] [DOC] [ML] User guide for CountVectorizer	Yuhao Yang	2015-08-28	1	-0/+109
\| \| \| \| \| \| \| \| \| \|	jira: https://issues.apache.org/jira/browse/SPARK-9890 document with Scala and java examples Author: Yuhao Yang <hhbyyh@gmail.com> Closes #8487 from hhbyyh/cvDoc.
*	Fix DynamodDB/DynamoDB typo in Kinesis Integration doc	Keiji Yoshida	2015-08-28	1	-1/+1
\| \| \| \| \| \| \| \|	Fix DynamodDB/DynamoDB typo in Kinesis Integration doc Author: Keiji Yoshida <yoshida.keiji.84@gmail.com> Closes #8501 from yosssi/patch-1.
*	[SPARK-9905] [ML] [DOC] Adds LinearRegressionSummary user guide	Feynman Liang	2015-08-27	1	-13/+127
\| \| \| \| \| \| \| \| \| \| \|	* Adds user guide for `LinearRegressionSummary` * Fixes unresolved issues in #8197 CC jkbradley mengxr Author: Feynman Liang <fliang@databricks.com> Closes #8491 from feynmanliang/SPARK-9905.
*	[SPARK-9911] [DOC] [ML] Update Userguide for Evaluator	MechCoder	2015-08-27	1	-0/+13
\| \| \| \| \| \| \| \|	I added a small note about the different types of evaluator and the metrics used. Author: MechCoder <manojkumarsivaraj334@gmail.com> Closes #8304 from MechCoder/multiclass_evaluator.
*	[SPARK-10287] [SQL] Fixes JSONRelation refreshing on read path	Yin Huai	2015-08-27	1	-0/+6
\| \| \| \| \| \| \| \| \| \|	https://issues.apache.org/jira/browse/SPARK-10287 After porting json to HadoopFsRelation, it seems hard to keep the behavior of picking up new files automatically for JSON. This PR removes this behavior, so JSON is consistent with others (ORC and Parquet). Author: Yin Huai <yhuai@databricks.com> Closes #8469 from yhuai/jsonRefresh.
*	[SPARK-9680] [MLLIB] [DOC] StopWordsRemovers user guide and Java ↵	Feynman Liang	2015-08-27	1	-3/+99
\| \| \| \| \| \| \| \| \| \| \| \| \|	compatibility test * Adds user guide for ml.feature.StopWordsRemovers, ran code examples on my machine * Cleans up scaladocs for public methods * Adds test for Java compatibility * Follow up Python user guide code example is tracked by SPARK-10249 Author: Feynman Liang <fliang@databricks.com> Closes #8436 from feynmanliang/SPARK-10230.
*	[SPARK-9906] [ML] User guide for LogisticRegressionSummary	MechCoder	2015-08-27	1	-16/+133
\| \| \| \| \| \| \| \| \| \|	User guide for LogisticRegression summaries Author: MechCoder <manojkumarsivaraj334@gmail.com> Author: Manoj Kumar <mks542@nyu.edu> Author: Feynman Liang <fliang@databricks.com> Closes #8197 from MechCoder/log_summary_user_guide.
*	[SPARK-9901] User guide for RowMatrix Tall-and-skinny QR	Yuhao Yang	2015-08-27	1	-1/+10
\| \| \| \| \| \| \| \| \| \|	jira: https://issues.apache.org/jira/browse/SPARK-9901 The jira covers only the document update. I can further provide example code for QR (like the ones for SVD and PCA) in a separate PR. Author: Yuhao Yang <hhbyyh@gmail.com> Closes #8462 from hhbyyh/qrDoc.
*	[SPARK-10315] remove document on spark.akka.failure-detector.threshold	CodingCat	2015-08-27	1	-10/+0
\| \| \| \| \| \| \| \| \| \|	https://issues.apache.org/jira/browse/SPARK-10315 this parameter is not used any longer and there is some mistake in the current document , should be 'akka.remote.watch-failure-detector.threshold' Author: CodingCat <zhunansjtu@gmail.com> Closes #8483 from CodingCat/SPARK_10315.