spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	[SPARK-11961][DOC] Add docs of ChiSqSelector	Xusen Yin	2015-12-01	1	-0/+50
\| \| \| \| \| \| \| \|	https://issues.apache.org/jira/browse/SPARK-11961 Author: Xusen Yin <yinxusen@gmail.com> Closes #9965 from yinxusen/SPARK-11961.
*	[SPARK-11821] Propagate Kerberos keytab for all environments	woj-i	2015-12-01	2	-5/+6
\| \| \| \| \| \| \| \| \|	andrewor14 the same PR as in branch 1.5 harishreedharan Author: woj-i <wojciechindyk@gmail.com> Closes #9859 from woj-i/master.
*	[HOTFIX][SPARK-12000] Add missing quotes in Jekyll API docs plugin.	Josh Rosen	2015-11-30	1	-1/+1
\| \| \| \|	I accidentally omitted these as part of #10049.
*	[SPARK-12035] Add more debug information in include_example tag of Jekyll	Xusen Yin	2015-11-30	1	-4/+6
\| \| \| \| \| \| \| \| \| \|	https://issues.apache.org/jira/browse/SPARK-12035 When we debuging lots of example code files, like in https://github.com/apache/spark/pull/10002, it's hard to know which file causes errors due to limited information in `include_example.rb`. With their filenames, we can locate bugs easily. Author: Xusen Yin <yinxusen@gmail.com> Closes #10026 from yinxusen/SPARK-12035.
*	[SPARK-12000] Fix API doc generation issues	Josh Rosen	2015-11-30	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This pull request fixes multiple issues with API doc generation. - Modify the Jekyll plugin so that the entire doc build fails if API docs cannot be generated. This will make it easy to detect when the doc build breaks, since this will now trigger Jenkins failures. - Change how we handle the `-target` compiler option flag in order to fix `javadoc` generation. - Incorporate doc changes from thunterdb (in #10048). Closes #10048. Author: Josh Rosen <joshrosen@databricks.com> Author: Timothy Hunter <timhunter@databricks.com> Closes #10049 from JoshRosen/fix-doc-build.
*	[SPARK-11960][MLLIB][DOC] User guide for streaming tests	Feynman Liang	2015-11-30	2	-0/+26
\| \| \| \| \| \| \| \|	CC jkbradley mengxr josepablocam Author: Feynman Liang <feynman.liang@gmail.com> Closes #10005 from feynmanliang/streaming-test-user-guide.
*	[SPARK-11689][ML] Add user guide and example code for LDA under spark.ml	Yuhao Yang	2015-11-30	3	-1/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	jira: https://issues.apache.org/jira/browse/SPARK-11689 Add simple user guide for LDA under spark.ml and example code under examples/. Use include_example to include example code in the user guide markdown. Check SPARK-11606 for instructions. Original PR is reverted due to document build error. https://github.com/apache/spark/pull/9722 mengxr feynmanliang yinxusen Sorry for the troubling. Author: Yuhao Yang <hhbyyh@gmail.com> Closes #9974 from hhbyyh/ldaMLExample.
*	[MINOR][DOCS] fixed list display in ml-ensembles	BenFradet	2015-11-30	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	The list in ml-ensembles.md wasn't properly formatted and, as a result, was looking like this: ![old](http://i.imgur.com/2ZhELLR.png) This PR aims to make it look like this: ![new](http://i.imgur.com/0Xriwd2.png) Author: BenFradet <benjamin.fradet@gmail.com> Closes #10025 from BenFradet/ml-ensembles-doc.
*	doc typo: "classificaion" -> "classification"	muxator	2015-11-26	1	-1/+1
\| \| \| \| \| \|	Author: muxator <muxator@users.noreply.github.com> Closes #10008 from muxator/patch-1.
*	[DOCUMENTATION] Fix minor doc error	Jeff Zhang	2015-11-25	1	-1/+1
\| \| \| \| \| \|	Author: Jeff Zhang <zjffdu@apache.org> Closes #9956 from zjffdu/dev_typo.
*	[MINOR] Remove unnecessary spaces in `include_example.rb`	Yu ISHIKAWA	2015-11-25	1	-4/+4
\| \| \| \| \| \|	Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #9960 from yu-iskw/minor-remove-spaces.
*	Updated sql programming guide to include jdbc fetch size	Stephen Samuel	2015-11-23	1	-0/+8
\| \| \| \| \| \|	Author: Stephen Samuel <sam@sksamuel.com> Closes #9377 from sksamuel/master.
*	[SPARK-11140][CORE] Transfer files using network lib when using NettyRpcEnv.	Marcelo Vanzin	2015-11-23	2	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change abstracts the code that serves jars / files to executors so that each RpcEnv can have its own implementation; the akka version uses the existing HTTP-based file serving mechanism, while the netty versions uses the new stream support added to the network lib, which makes file transfers benefit from the easier security configuration of the network library, and should also reduce overhead overall. The change includes a small fix to TransportChannelHandler so that it propagates user events to downstream handlers. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #9530 from vanzin/SPARK-11140.
*	[SPARK-11910][STREAMING][DOCS] Update twitter4j dependency version	Luciano Resende	2015-11-23	1	-1/+1
\| \| \| \| \| \|	Author: Luciano Resende <lresende@apache.org> Closes #9892 from lresende/SPARK-11910.
*	[SPARK-7173][YARN] Add label expression support for application master	jerryshao	2015-11-23	1	-0/+9
\| \| \| \| \| \| \| \| \| \|	Add label expression support for AM to restrict it runs on the specific set of nodes. I tested it locally and works fine. sryza and vanzin please help to review, thanks a lot. Author: jerryshao <sshao@hortonworks.com> Closes #9800 from jerryshao/SPARK-7173.
*	[SPARK-11835] Adds a sidebar menu to MLlib's documentation	Timothy Hunter	2015-11-22	6	-8/+163
\| \| \| \| \| \| \| \| \| \|	This PR adds a sidebar menu when browsing the user guide of MLlib. It uses a YAML file to describe the structure of the documentation. It should be trivial to adapt this to the other projects. ![screen shot 2015-11-18 at 4 46 12 pm](https://cloud.githubusercontent.com/assets/7594753/11259591/a55173f4-8e17-11e5-9340-0aed79d66262.png) Author: Timothy Hunter <timhunter@databricks.com> Closes #9826 from thunterdb/spark-11835.
*	Revert "[SPARK-11689][ML] Add user guide and example code for LDA under ↵	Xiangrui Meng	2015-11-20	3	-33/+1
\| \| \| \| \| \|	spark.ml" This reverts commit e359d5dcf5bd300213054ebeae9fe75c4f7eb9e7.
*	[SPARK-11549][DOCS] Replace example code in mllib-evaluation-metrics.md ↵	Vikas Nelamangala	2015-11-20	1	-925/+15
\| \| \| \| \| \| \| \|	using include_example Author: Vikas Nelamangala <vikasnelamangala@Vikass-MacBook-Pro.local> Closes #9689 from vikasnp/master.
*	[SPARK-11689][ML] Add user guide and example code for LDA under spark.ml	Yuhao Yang	2015-11-20	3	-1/+33
\| \| \| \| \| \| \| \| \| \|	jira: https://issues.apache.org/jira/browse/SPARK-11689 Add simple user guide for LDA under spark.ml and example code under examples/. Use include_example to include example code in the user guide markdown. Check SPARK-11606 for instructions. Author: Yuhao Yang <hhbyyh@gmail.com> Closes #9722 from hhbyyh/ldaMLExample.
*	[SPARK-11339][SPARKR] Document the list of functions in R base package that ↵	felixcheung	2015-11-18	1	-1/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	are masked by functions with same name in SparkR Added tests for function that are reported as masked, to make sure the base:: or stats:: function can be called. For those we can't call, added them to SparkR programming guide. It would seem to me `table, sample, subset, filter, cov` not working are not actually expected - I investigated/experimented with them but couldn't get them to work. It looks like as they are defined in base or stats they are missing the S3 generic, eg. ``` > methods("transform") [1] transform,ANY-method transform.data.frame [3] transform,DataFrame-method transform.default see '?methods' for accessing help and source code > methods("subset") [1] subset.data.frame subset,DataFrame-method subset.default [4] subset.matrix see '?methods' for accessing help and source code Warning message: In .S3methods(generic.function, class, parent.frame()) : function 'subset' appears not to be S3 generic; found functions that look like S3 methods ``` Any idea? More information on masking: http://www.ats.ucla.edu/stat/r/faq/referencing_objects.htm http://www.sfu.ca/~sweldon/howTo/guide4.pdf This is what the output doc looks like (minus css): ![image](https://cloud.githubusercontent.com/assets/8969467/11229714/2946e5de-8d4d-11e5-94b0-dda9696b6fdd.png) Author: felixcheung <felixcheung_m@hotmail.com> Closes #9785 from felixcheung/rmasked.
*	[SPARK-11684][R][ML][DOC] Update SparkR glm API doc, user guide and example ↵	Yanbo Liang	2015-11-18	1	-8/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	codes This PR includes: * Update SparkR:::glm, SparkR:::summary API docs. * Update SparkR machine learning user guide and example codes to show: * supporting feature interaction in R formula. * summary for gaussian GLM model. * coefficients for binomial GLM model. mengxr Author: Yanbo Liang <ybliang8@gmail.com> Closes #9727 from yanboliang/spark-11684.
*	[SPARK-11809] Switch the default Mesos mode to coarse-grained mode	Reynold Xin	2015-11-18	2	-11/+18
\| \| \| \| \| \| \| \|	Based on my conversions with people, I believe the consensus is that the coarse-grained mode is more stable and easier to reason about. It is best to use that as the default rather than the more flaky fine-grained mode. Author: Reynold Xin <rxin@databricks.com> Closes #9795 from rxin/SPARK-11809.
*	[SPARK-11728] Replace example code in ml-ensembles.md using include_example	Xusen Yin	2015-11-17	1	-740/+14
\| \| \| \| \| \| \| \| \| \|	JIRA issue https://issues.apache.org/jira/browse/SPARK-11728. The ml-ensembles.md file contains `OneVsRestExample`. Instead of writing new code files of two `OneVsRestExample`s, I use two existing files in the examples directory, they are `OneVsRestExample.scala` and `JavaOneVsRestExample.scala`. Author: Xusen Yin <yinxusen@gmail.com> Closes #9716 from yinxusen/SPARK-11728.
*	[SPARK-11729] Replace example code in ml-linear-methods.md using include_example	Xusen Yin	2015-11-17	1	-210/+8
\| \| \| \| \| \| \| \|	JIRA link: https://issues.apache.org/jira/browse/SPARK-11729 Author: Xusen Yin <yinxusen@gmail.com> Closes #9713 from yinxusen/SPARK-11729.
*	[SPARK-11089][SQL] Adds option for disabling multi-session in Thrift server	Cheng Lian	2015-11-17	1	-0/+14
\| \| \| \| \| \| \| \| \| \|	This PR adds a new option `spark.sql.hive.thriftServer.singleSession` for disabling multi-session support in the Thrift server. Note that this option is added as a Spark configuration (retrieved from `SparkConf`) rather than Spark SQL configuration (retrieved from `SQLConf`). This is because all SQL configurations are session-ized. Since multi-session support is by default on, no JDBC connection can modify global configurations like the newly added one. Author: Cheng Lian <lian@databricks.com> Closes #9740 from liancheng/spark-11089.single-session-option.
*	[SPARK-11779][DOCS] Fix reference to deprecated MESOS_NATIVE_LIBRARY	Philipp Hoffmann	2015-11-17	1	-1/+1
\| \| \| \| \| \| \| \|	MESOS_NATIVE_LIBRARY was renamed in favor of MESOS_NATIVE_JAVA_LIBRARY. This commit fixes the reference in the documentation. Author: Philipp Hoffmann <mail@philipphoffmann.de> Closes #9768 from philipphoffmann/patch-2.
*	[SPARK-11751] Doc describe error in the "Spark Streaming Programming Guide" page	yangping.wu	2015-11-17	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \|	In the [Task Launching Overheads](http://spark.apache.org/docs/latest/streaming-programming-guide.html#task-launching-overheads) section, >Task Serialization: Using Kryo serialization for serializing tasks can reduce the task sizes, and therefore reduce the time taken to send them to the slaves. as we known Task Serialization is configuration by spark.closure.serializer parameter, but currently only the Java serializer is supported. If we set spark.closure.serializer to org.apache.spark.serializer.KryoSerializer, then this will throw a exception. Author: yangping.wu <wyphao.2007@163.com> Closes #9734 from 397090770/397090770-patch-1.
*	[SPARK-11710] Document new memory management model	Andrew Or	2015-11-16	2	-23/+44
\| \| \| \| \| \|	Author: Andrew Or <andrew@databricks.com> Closes #9676 from andrewor14/memory-management-docs.
*	[MINOR][DOCS] typo in docs/configuration.md	Kai Jiang	2015-11-14	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \|	`<\code>` end tag missing backslash in docs/configuration.md{L308-L339} ref #8795 Author: Kai Jiang <jiangkai@gmail.com> Closes #9715 from vectorijk/minor-typo-docs.
*	[SPARK-11336] Add links to example codes	Xusen Yin	2015-11-13	1	-2/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	https://issues.apache.org/jira/browse/SPARK-11336 mengxr I add a hyperlink of Spark on Github and a hint of their existences in Spark code repo in each code example. I remove the config key for changing the example code dir, since we assume all examples should be in spark/examples. The hyperlink, though we cannot use it now, since the Spark v1.6.0 has not been released yet, can be used after the release. So it is not a problem. I add some screen shots, so you can get an instant feeling. <img width="949" alt="screen shot 2015-10-27 at 10 47 18 pm" src="https://cloud.githubusercontent.com/assets/2637239/10780634/bd20e072-7cfc-11e5-8960-def4fc62a8ea.png"> <img width="1144" alt="screen shot 2015-10-27 at 10 47 31 pm" src="https://cloud.githubusercontent.com/assets/2637239/10780636/c3f6e180-7cfc-11e5-80b2-233589f4a9a3.png"> Author: Xusen Yin <yinxusen@gmail.com> Closes #9320 from yinxusen/SPARK-11336.
*	[SPARK-11723][ML][DOC] Use LibSVM data source rather than ↵	Yanbo Liang	2015-11-13	4	-19/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	MLUtils.loadLibSVMFile to load DataFrame Use LibSVM data source rather than MLUtils.loadLibSVMFile to load DataFrame, include: * Use libSVM data source for all example codes under examples/ml, and remove unused import. * Use libSVM data source for user guides under ml-*** which were omitted by #8697. * Fix bug: We should use ```sqlContext.read().format("libsvm").load(path)``` at Java side, but the API doc and user guides misuse as ```sqlContext.read.format("libsvm").load(path)```. * Code cleanup. mengxr Author: Yanbo Liang <ybliang8@gmail.com> Closes #9690 from yanboliang/spark-11723.
*	[SPARK-11445][DOCS] Replaced example code in mllib-ensembles.md using ↵	Rishabh Bhardwaj	2015-11-13	1	-514/+12
\| \| \| \| \| \| \| \| \| \| \|	include_example I have made the required changes and tested. Kindly review the changes. Author: Rishabh Bhardwaj <rbnext29@gmail.com> Closes #9407 from rishabhbhardwaj/SPARK-11445.
*	[SPARK-11629][ML][PYSPARK][DOC] Python example code for Multilayer ↵	Yanbo Liang	2015-11-12	1	-66/+5
\| \| \| \| \| \| \| \| \| \|	Perceptron Classification Add Python example code for Multilayer Perceptron Classification, and make example code in user guide document testable. mengxr Author: Yanbo Liang <ybliang8@gmail.com> Closes #9594 from yanboliang/spark-11629.
*	[SPARK-11667] Update dynamic allocation docs to reflect supported cluster ↵	Andrew Or	2015-11-12	1	-28/+27
\| \| \| \| \| \| \| \|	managers Author: Andrew Or <andrew@databricks.com> Closes #9637 from andrewor14/update-da-docs.
*	[SPARK-11670] Fix incorrect kryo buffer default value in docs	Andrew Or	2015-11-12	1	-2/+2
\| \| \| \| \| \| \| \|	<img width="931" alt="screen shot 2015-11-11 at 1 53 21 pm" src="https://cloud.githubusercontent.com/assets/2133137/11108261/35d183d4-889a-11e5-9572-85e9d6cebd26.png"> Author: Andrew Or <andrew@databricks.com> Closes #9638 from andrewor14/fix-kryo-docs.
*	[SPARK-11335][STREAMING] update kafka direct python docs on how to get the ↵	Nick Evans	2015-11-11	1	-1/+14
\| \| \| \| \| \| \| \| \| \| \| \|	offset ranges for a KafkaRDD tdas koeninger This updates the Spark Streaming + Kafka Integration Guide doc with a working method to access the offsets of a `KafkaRDD` through Python. Author: Nick Evans <me@nicolasevans.org> Closes #9289 from manygrams/update_kafka_direct_python_docs.
*	[SPARK-6152] Use shaded ASM5 to support closure cleaning of Java 8 compiled ↵	Josh Rosen	2015-11-11	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	classes This patch modifies Spark's closure cleaner (and a few other places) to use ASM 5, which is necessary in order to support cleaning of closures that were compiled by Java 8. In order to avoid ASM dependency conflicts, Spark excludes ASM from all of its dependencies and uses a shaded version of ASM 4 that comes from `reflectasm` (see [SPARK-782](https://issues.apache.org/jira/browse/SPARK-782) and #232). This patch updates Spark to use a shaded version of ASM 5.0.4 that was published by the Apache XBean project; the POM used to create the shaded artifact can be found at https://github.com/apache/geronimo-xbean/blob/xbean-4.4/xbean-asm5-shaded/pom.xml. http://movingfulcrum.tumblr.com/post/80826553604/asm-framework-50-the-missing-migration-guide was a useful resource while upgrading the code to use the new ASM5 opcodes. I also added a new regression tests in the `java8-tests` subproject; the existing tests were insufficient to catch this bug, which only affected Scala 2.11 user code which was compiled targeting Java 8. Author: Josh Rosen <joshrosen@databricks.com> Closes #9512 from JoshRosen/SPARK-6152.
*	[SPARK-11550][DOCS] Replace example code in mllib-optimization.md using ↵	Pravin Gadakh	2015-11-10	1	-143/+2
\| \| \| \| \| \| \| \|	include_example Author: Pravin Gadakh <pravingadakh177@gmail.com> Closes #9516 from pravingadakh/SPARK-11550.
*	[SPARK-11382] Replace example code in mllib-decision-tree.md using ↵	Xusen Yin	2015-11-10	1	-247/+6
\| \| \| \| \| \| \| \| \| \| \| \|	include_example https://issues.apache.org/jira/browse/SPARK-11382 B.T.W. I fix an error in naive_bayes_example.py. Author: Xusen Yin <yinxusen@gmail.com> Closes #9596 from yinxusen/SPARK-11382.
*	[SPARK-11360][DOC] Loss of nullability when writing parquet files	gatorsmile	2015-11-09	1	-1/+2
\| \| \| \| \| \| \| \|	This fix is to add one line to explain the current behavior of Spark SQL when writing Parquet files. All columns are forced to be nullable for compatibility reasons. Author: gatorsmile <gatorsmile@gmail.com> Closes #9314 from gatorsmile/lossNull.
*	[SPARK-11548][DOCS] Replaced example code in ↵	Rishabh Bhardwaj	2015-11-09	1	-135/+3
\| \| \| \| \| \| \| \| \| \|	mllib-collaborative-filtering.md using include_example Kindly review the changes. Author: Rishabh Bhardwaj <rbnext29@gmail.com> Closes #9519 from rishabhbhardwaj/SPARK-11337.
*	[SPARK-11552][DOCS][Replaced example code in ml-decision-tree.md using ↵	sachin aggarwal	2015-11-09	1	-330/+8
\| \| \| \| \| \| \| \| \| \|	include_example] I have tested it on my local, it is working fine, please review Author: sachin aggarwal <different.sachin@gmail.com> Closes #9539 from agsachin/SPARK-11552-real.
*	[SPARK-11581][DOCS] Example mllib code in documentation incorrectly computes MSE	Bharat Lal	2015-11-09	1	-1/+1
\| \| \| \| \| \|	Author: Bharat Lal <bharat.iisc@gmail.com> Closes #9560 from bharatl/SPARK-11581.
*	[DOCS] Fix typo for Python section on unifying Kafka streams	chriskang90	2015-11-09	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	1) kafkaStreams is a list. The list should be unpacked when passing it into the streaming context union method, which accepts a variable number of streams. 2) print() should be pprint() for pyspark. This contribution is my original work, and I license the work to the project under the project's open source license. Author: chriskang90 <jckang@uchicago.edu> Closes #9545 from c-kang/streaming_python_typo.
*	[SPARK-10689][ML][DOC] User guide and example code for AFTSurvivalRegression	Yanbo Liang	2015-11-09	2	-0/+97
\| \| \| \| \| \| \| \|	Add user guide and example code for ```AFTSurvivalRegression```. Author: Yanbo Liang <ybliang8@gmail.com> Closes #9491 from yanboliang/spark-10689.
*	[DOC][MINOR][SQL] Fix internal link	Rohit Agarwal	2015-11-09	1	-1/+1
\| \| \| \| \| \| \| \|	It doesn't show up as a hyperlink currently. It will show up as a hyperlink after this change. Author: Rohit Agarwal <mindprince@gmail.com> Closes #9544 from mindprince/patch-2.
*	[SPARK-10046][SQL] Hive warehouse dir not set in current directory when not …	xin Wu	2015-11-08	1	-2/+4
\| \| \| \| \| \| \| \|	Doc change to align with HiveConf default in terms of where to create `warehouse` directory. Author: xin Wu <xinwu@us.ibm.com> Closes #9365 from xwu0226/spark-10046-commit.
*	[DOC][SQL] Remove redundant out-of-place python snippet	Rohit Agarwal	2015-11-08	1	-9/+0
\| \| \| \| \| \| \| \|	This snippet seems to be mistakenly introduced at two places in #5348. Author: Rohit Agarwal <mindprince@gmail.com> Closes #9540 from mindprince/patch-1.
*	[SPARK-11476][DOCS] Incorrect function referred to in MLib Random data ↵	Sean Owen	2015-11-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	generation documentation Fix Python example to use normalRDD as advertised Author: Sean Owen <sowen@cloudera.com> Closes #9529 from srowen/SPARK-11476.
*	[MINOR][ML][DOC] Rename weights to coefficients in user guide	Yanbo Liang	2015-11-05	1	-12/+12
\| \| \| \| \| \| \| \|	We should use ```coefficients``` rather than ```weights``` in user guide that freshman can get the right conventional name at the outset. mengxr vectorijk Author: Yanbo Liang <ybliang8@gmail.com> Closes #9493 from yanboliang/docs-coefficients.