spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	Also add graphx commons-math3 dependeny in sbt build	Sean Owen	2014-01-22	1	-1/+4
\|
*	Depend on Commons Math explicitly instead of accidentally getting it from ↵	Sean Owen	2014-01-22	2	-1/+6
\| \| \| \|	Hadoop (which stops working in 2.2.x) and also use the newer commons-math3
*	Merge pull request #489 from ash211/patch-6	Reynold Xin	2014-01-21	1	-1/+1
\|\ \| \| \| \| \| \| \| \| \| \|	Clarify spark.default.parallelism It's the task count across the cluster, not per worker, per machine, per core, or anything else.
\| *	Clarify spark.default.parallelism	Andrew Ash	2014-01-21	1	-1/+1
\|/ \| \|	It's the task count across the cluster, not per worker, per machine, per core, or anything else.
*	Merge pull request #469 from ↵	Reynold Xin	2014-01-21	10	-113/+43
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ajtulloch/use-local-spark-context-in-tests-for-mllib [MLlib] Use a LocalSparkContext trait in test suites Replaces the 9 instances of ```scala class XXXSuite extends FunSuite with BeforeAndAfterAll { @transient private var sc: SparkContext = _ override def beforeAll() { sc = new SparkContext("local", "test") } override def afterAll() { sc.stop() System.clearProperty("spark.driver.port") } ``` with ```scala class XXXSuite extends FunSuite with LocalSparkContext { ```
\| *	Fixed import order	Andrew Tulloch	2014-01-21	5	-7/+4
\| \|
\| *	LocalSparkContext for MLlib	Andrew Tulloch	2014-01-19	10	-109/+42
\| \|
* \|	Merge pull request #480 from pwendell/0.9-fixes	Patrick Wendell	2014-01-21	7	-24/+65
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Handful of 0.9 fixes This patch addresses a few fixes for Spark 0.9.0 based on the last release candidate. @mridulm gets credit for reporting most of the issues here. Many of the fixes here are based on his work in #477 and follow up discussion with him.
\| * \|	Style clean-up	Patrick Wendell	2014-01-21	2	-11/+9
\| \| \|
\| * \|	Adding small code comment	Patrick Wendell	2014-01-20	1	-1/+2
\| \| \|
\| * \|	Avoid matching attempt files in the checkpoint	Patrick Wendell	2014-01-20	1	-2/+2
\| \| \|
\| * \|	Remove shuffle files if they are still present on a machine.	Patrick Wendell	2014-01-20	1	-3/+10
\| \| \|
\| * \|	Fixing speculation bug	Patrick Wendell	2014-01-20	1	-1/+1
\| \| \|
\| * \|	Force use of LZF when spilling data	Patrick Wendell	2014-01-20	2	-7/+39
\| \| \|
\| * \|	Bug fix for reporting of spill output	Patrick Wendell	2014-01-20	1	-1/+3
\| \| \|
\| * \|	Minor fixes	Patrick Wendell	2014-01-20	1	-1/+1
\| \| \|
\| * \|	Removing docs on akka options	Patrick Wendell	2014-01-20	2	-8/+9
\| \| \|
* \| \|	Merge pull request #484 from tdas/run-example-fix	Patrick Wendell	2014-01-20	1	-2/+11
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Made run-example respect SPARK_JAVA_OPTS and SPARK_MEM. bin/run-example scripts was not passing Java properties set through the SPARK_JAVA_OPTS to the example. This is important for examples like Twitter** as the Twitter authentication information must be set through java properties. Hence added the same JAVA_OPTS code in run-example as it is in bin/spark-class script. Also added SPARK_MEM, in case someone wants to run the example with different amounts of memory. This can be removed if it is not tune with the intended semantics of the run-example scripts. @matei Please check this soon I want this to go in 0.9-rc4
\| * \| \|	Removed SPARK_MEM from run-examples.	Tathagata Das	2014-01-20	1	-5/+0
\| \| \| \|
\| * \| \|	Made run-example respect SPARK_JAVA_OPTS and SPARK_MEM.	Tathagata Das	2014-01-20	1	-2/+16
\| \|/ /
* \| \|	Merge pull request #449 from CrazyJvm/master	Reynold Xin	2014-01-20	1	-3/+8
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SPARK-1028 : fix "set MASTER automatically fails" bug. spark-shell intends to set MASTER automatically if we do not provide the option when we start the shell , but there's a problem. The condition is "if [[ "x" != "x$SPARK_MASTER_IP" && "y" != "y$SPARK_MASTER_PORT" ]];" we sure will set SPARK_MASTER_IP explicitly, the SPARK_MASTER_PORT option, however, we probably do not set just using spark default port 7077. So if we do not set SPARK_MASTER_PORT, the condition will never be true. We should just use default port if users do not set port explicitly I think.
\| * \| \|	fix some format problem.	CrazyJvm	2014-01-16	1	-2/+2
\| \| \| \|
\| * \| \|	fix "set MASTER automatically fails" bug.	CrazyJvm	2014-01-16	1	-3/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	spark-shell intends to set MASTER automatically if we do not provide the option when we start the shell , but there's a problem. The condition is "if [[ "x" != "x$SPARK_MASTER_IP" && "y" != "y$SPARK_MASTER_PORT" ]];" we sure will set SPARK_MASTER_IP explicitly, the SPARK_MASTER_PORT option, however, we probably do not set just using spark default port 7077. So if we do not set SPARK_MASTER_PORT, the condition will never be true. We should just use default port if users do not set port explicitly I think.
* \| \| \|	Merge pull request #482 from tdas/streaming-example-fix	Patrick Wendell	2014-01-20	17	-0/+17
\|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Added StreamingContext.awaitTermination to streaming examples StreamingContext.start() currently starts a non-daemon thread which prevents termination of a Spark Streaming program even if main function has exited. Since the expected behavior of a streaming program is to run until explicitly killed, this was sort of fine when spark streaming applications are launched from the command line. However, when launched in Yarn-standalone mode, this did not work as the driver effectively got terminated when the main function exits. So SparkStreaming examples did not work on Yarn. This addition to the examples ensures that the examples work on Yarn and also ensures that everyone learns that StreamingContext.awaitTermination() being necessary for SparkStreaming programs to wait. The true bug-fix of making sure all threads by Spark Streaming are daemon threads is left for post-0.9.
\| * \| \| \|	Added StreamingContext.awaitTermination to streaming examples.	Tathagata Das	2014-01-20	17	-0/+17
\| \| \|/ / \| \|/\| \|
* \| \| \|	Merge pull request #483 from pwendell/gitignore	Reynold Xin	2014-01-20	1	-1/+1
\|\ \ \ \ \| \|/ / / \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Restricting /lib to top level directory in .gitignore This patch was proposed by Sean Mackrory.
\| * \| \|	Restricting /lib to top level directory in .gitignore	Patrick Wendell	2014-01-20	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch was proposed by Sean Mackrory.
* \| \| \|	Merge pull request #470 from tgravescs/fix_spark_examples_yarn	Patrick Wendell	2014-01-19	1	-3/+8
\|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Only log error on missing jar to allow spark examples to jar. Right now to run the spark examples on Yarn you have to use the --addJars option and put the jar in hdfs. To make that nicer so the user doesn't have to specify the --addJars option change it to simply log an error instead of throwing.
\| * \| \| \|	update comment	Thomas Graves	2014-01-19	1	-1/+1
\| \| \| \| \|
\| * \| \| \|	Only log error on missing jar to allow spark examples to jar.	Thomas Graves	2014-01-19	1	-3/+8
\| \| \|_\|/ \| \|/\| \|
* \| \| \|	Merge pull request #458 from tdas/docs-update	Patrick Wendell	2014-01-19	9	-76/+79
\|\ \ \ \ \| \|/ / / \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Updated java API docs for streaming, along with very minor changes in the code examples. Docs updated for: Scala: StreamingContext, DStream, PairDStreamFunctions Java: JavaStreamingContext, JavaDStream, JavaPairDStream Example updated: JavaQueueStream: Not use deprecated method ActorWordCount: Use the public interface the right way.
\| * \| \|	Updated java API docs for streaming, along with very minor changes in the ↵	Tathagata Das	2014-01-16	9	-76/+79
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	code examples.
* \| \| \|	Merge pull request #459 from srowen/UpdaterL2Regularization	Patrick Wendell	2014-01-18	1	-1/+5
\|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Correct L2 regularized weight update with canonical form Per thread on the user@ mailing list, and comments from Ameet, I believe the weight update for L2 regularization needs to be corrected. See http://mail-archives.apache.org/mod_mbox/spark-user/201401.mbox/%3CCAH3_EVMetuQuhj3__NdUniDLc4P-FMmmrmxw9TS14or8nT4BNQ%40mail.gmail.com%3E
\| * \| \| \|	Correct L2 regularized weight update with canonical form	Sean Owen	2014-01-18	1	-1/+5
\| \| \| \| \|
* \| \| \| \|	Merge pull request #437 from mridulm/master	Patrick Wendell	2014-01-18	3	-2/+5
\|\ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Minor api usability changes - Expose checkpoint directory - since it is autogenerated now - null check for jars - Expose SparkHadoopUtil : so that configuration creation is abstracted even from user code to avoid duplication of functionality already in spark.
\| * \| \| \| \|	Address review comment	Mridul Muralidharan	2014-01-17	1	-1/+1
\| \| \| \| \| \|
\| * \| \| \| \|	Use method, not variable	Mridul Muralidharan	2014-01-16	1	-1/+1
\| \| \| \| \| \|
\| * \| \| \| \|	Address review comments	Mridul Muralidharan	2014-01-16	2	-2/+4
\| \| \| \| \| \|
\| * \| \| \| \|	Expose method and class - so that we can use it from user code (particularly ↵	Mridul Muralidharan	2014-01-15	2	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	since checkpoint directory is autogenerated now
* \| \| \| \| \|	Merge pull request #426 from mateiz/py-ml-tests	Patrick Wendell	2014-01-18	2	-5/+15
\|\ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Re-enable Python MLlib tests (require Python 2.7 and NumPy 1.7+) We disabled these earlier because Jenkins didn't have these versions.
\| * \| \| \| \| \|	Complain if Python and NumPy versions are too old for MLlib	Matei Zaharia	2014-01-14	1	-0/+10
\| \| \| \| \| \| \|
\| * \| \| \| \| \|	Re-enable Python MLlib tests (require Python 2.7 and NumPy 1.7+)	Matei Zaharia	2014-01-14	1	-5/+5
\| \| \| \| \| \| \|
* \| \| \| \| \| \|	Merge pull request #462 from mateiz/conf-file-fix	Patrick Wendell	2014-01-18	6	-71/+41
\| \|_\|_\|_\|/ / \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove Typesafe Config usage and conf files to fix nested property names With Typesafe Config we had the subtle problem of no longer allowing nested property names, which are used for a few of our properties: http://apache-spark-developers-list.1001551.n3.nabble.com/Config-properties-broken-in-master-td208.html This PR is for branch 0.9 but should be added into master too. (cherry picked from commit 34e911ce9a9f91f3259189861779032069257852) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
* \| \| \| \| \|	Merge pull request #461 from pwendell/master	Patrick Wendell	2014-01-18	1	-1/+1
\|\ \ \ \ \ \ \| \|_\|_\|/ / / \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use renamed shuffle spill config in CoGroupedRDD.scala This one got missed when it was renamed.
\| * \| \| \| \|	Use renamed shuffle spill config in CoGroupedRDD.scala	Patrick Wendell	2014-01-18	1	-1/+1
\|/ / / / /
* \| \| \| \|	Merge pull request #451 from Qiuzhuang/master	Patrick Wendell	2014-01-16	2	-3/+3
\|\ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixed Window spark shell launch script error. JIRA SPARK-1029:https://spark-project.atlassian.net/browse/SPARK-1029
\| * \| \| \| \|	Fixed Window spark shell launch script error.	Qiuzhuang Lian	2014-01-16	2	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	JIRA SPARK-1029:https://spark-project.atlassian.net/browse/SPARK-1029
* \| \| \| \| \|	Merge pull request #438 from ScrapCodes/clone-records-java-api	Patrick Wendell	2014-01-16	1	-2/+114
\|\ \ \ \ \ \ \| \|_\|_\|_\|/ / \|/\| \| \| \| \| \| \| \| \| \| \|	Clone records java api
\| * \| \| \| \|	adding clone records field to equivaled java apis	Prashant Sharma	2014-01-17	1	-2/+114
\| \| \| \| \| \|
* \| \| \| \| \|	Merge pull request #445 from kayousterhout/exec_lost	Reynold Xin	2014-01-15	2	-1/+18
\|\ \ \ \ \ \ \| \|_\|/ / / / \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fail rather than hanging if a task crashes the JVM. Prior to this commit, if a task crashes the JVM, the task (and all other tasks running on that executor) is marked at KILLED rather than FAILED. As a result, the TaskSetManager will retry the task indefinitely rather than failing the job after maxFailures. Eventually, this makes the job hang, because the Standalone Scheduler removes the application after 10 works have failed, and then the app is left in a state where it's disconnected from the master and waiting to reconnect. This commit fixes that problem by marking tasks as FAILED rather than killed when an executor is lost. The downside of this commit is that if task A fails because another task running on the same executor caused the VM to crash, the failure will incorrectly be counted as a failure of task A. This should not be an issue because we typically set maxFailures to 3, and it is unlikely that a task will be co-located with a JVM-crashing task multiple times.