spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	Adding unit tests and some refactoring to promote testability.	Patrick Wendell	2014-01-07	8	-35/+251
\|
*	Fixes after merge	Patrick Wendell	2014-01-06	3	-6/+8
\|
*	Merge remote-tracking branch 'apache-github/master' into standalone-driver	Patrick Wendell	2014-01-06	123	-1234/+2220
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/deploy/client/AppClient.scala core/src/main/scala/org/apache/spark/deploy/client/TestClient.scala core/src/main/scala/org/apache/spark/deploy/master/Master.scala core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala
\| *	Fix test breaking downstream builds	Patrick Wendell	2014-01-06	1	-1/+1
\| \|
\| *	Merge pull request #330 from tgravescs/fix_addjars_null_handling	Patrick Wendell	2014-01-06	1	-2/+3
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix handling of empty SPARK_EXAMPLES_JAR Currently if SPARK_EXAMPLES_JAR is left unset you get a null pointer exception when running the examples (atleast on spark on yarn). The null now gets turned into a string of "null" when its put into the SparkConf so addJar no longer properly ignores it. This fixes that so that it can be left unset.
\| \| *	Add warning to null setJars check	Thomas Graves	2014-01-06	1	-1/+2
\| \| \|
\| \| *	Fix handling of empty SPARK_EXAMPLES_JAR	Thomas Graves	2014-01-04	1	-1/+1
\| \| \|
\| * \|	Merge pull request #333 from pwendell/logging-silence	Patrick Wendell	2014-01-05	2	-3/+25
\| \|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Quiet ERROR-level Akka Logs This fixes an issue I've seen where akka logs a bunch of things at ERROR level when connecting to a standalone cluster, even in the normal case. I noticed that even when lifecycle logging was disabled, the netty code inside of akka still logged away via akka's EndpointWriter class. There are also some other log streams that I think are new in akka 2.2.1 that I've disabled. Finally, I added some better logging to the standalone client. This makes it more clear when a connection failure occurs what is going on. Previously it never explicitly said if a connection attempt had failed. The commit messages here have some more detail.
\| \| * \|	Responding to Aaron's review	Patrick Wendell	2014-01-05	1	-0/+2
\| \| \| \|
\| \| * \|	Provide logging when attempts to connect to the master fail.	Patrick Wendell	2014-01-05	1	-1/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Without these it's a bit less clear what's going on for the user. One thing I realize when doing this is that akka itself actually retries the initial association. So the retry we currently have is redundant with akka's.
\| \| * \|	Quite akka when remote lifecycle logging is disabled.	Patrick Wendell	2014-01-05	1	-2/+12
\| \| \|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I noticed when connecting to a standalone cluster Spark gives a bunch of Akka ERROR logs that make it seem like something is failing. This patch does two things: 1. Akka dead letter logging is turned on/off according to the existing lifecycle spark property. 2. We explicitly silence akka's EndpointWriter log in log4j. This is necessary because for some reason that log doesn't pick up on the lifecycle logging settings. After a few hours of debugging this was the only solution I found that worked.
\| * \|	Merge pull request #334 from pwendell/examples-fix	Reynold Xin	2014-01-05	1	-0/+6
\| \|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Removing SPARK_EXAMPLES_JAR in the code This re-writes all of the examples to use the `SparkContext.jarOfClass` mechanism for loading the examples jar. This necessary for environments like YARN and the Standalone mode where example programs will be submit from inside the cluster rather than at the client using `./spark-example`. This still leaves SPARK_EXAMPLES_JAR in place in the shell scripts for setting up the classpath if `./spark-example` is run.
\| \| * \|	Removing SPARK_EXAMPLES_JAR in the code	Patrick Wendell	2014-01-05	1	-0/+6
\| \| \|/
\| * /	Fall back to zero-arg constructor for Serializer initialization if there is ↵	Reynold Xin	2014-01-05	2	-2/+16
\| \|/ \| \| \| \| \| \| \| \| \| \|	no constructor that accepts SparkConf. This maintains backward compatibility with older serializers implemented by users.
\| *	Merge remote-tracking branch 'apache-github/master' into remove-binaries	Patrick Wendell	2014-01-03	4	-7/+7
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/test/scala/org/apache/spark/DriverSuite.scala docs/python-programming-guide.md
\| \| *	Merge pull request #317 from ScrapCodes/spark-915-segregate-scripts	Patrick Wendell	2014-01-03	4	-7/+7
\| \| \|\ \| \| \| \| \| \| \| \| \| \| \| \|	Spark-915 segregate scripts
\| \| \| *	sbin/compute-classpath* bin/compute-classpath*	Prashant Sharma	2014-01-03	1	-1/+1
\| \| \| \|
\| \| \| *	sbin/spark-class* -> bin/spark-class*	Prashant Sharma	2014-01-03	3	-5/+5
\| \| \| \|
\| \| \| *	Merge branch 'scripts-reorg' of github.com:shane-huang/incubator-spark into ↵	Prashant Sharma	2014-01-02	5	-7/+7
\| \| \| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	spark-915-segregate-scripts Conflicts: bin/spark-shell core/pom.xml core/src/main/scala/org/apache/spark/SparkContext.scala core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala core/src/main/scala/org/apache/spark/ui/UIWorkloadGenerator.scala core/src/test/scala/org/apache/spark/DriverSuite.scala python/run-tests sbin/compute-classpath.sh sbin/spark-class sbin/stop-slaves.sh
\| \| \| \| *	deprecate "spark" script and SPAKR_CLASSPATH environment variable	Andrew xia	2013-10-12	2	-2/+1
\| \| \| \| \|
\| \| \| \| *	Merge branch 'reorgscripts' into scripts-reorg	shane-huang	2013-09-27	5	-7/+7
\| \| \| \| \|\
\| \| \| \| \| *	fix paths and change spark to use APP_MEM as application driver memory ↵	shane-huang	2013-09-26	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	instead of SPARK_MEM, user should add application jars to SPARK_CLASSPATH Signed-off-by: shane-huang <shengsheng.huang@intel.com>
\| \| \| \| \| *	fix path	shane-huang	2013-09-26	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: shane-huang <shengsheng.huang@intel.com>
\| \| \| \| \| *	added spark-class and spark-executor to sbin	shane-huang	2013-09-23	4	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: shane-huang <shengsheng.huang@intel.com>
\| * \| \| \| \|	Changes on top of Prashant's patch.	Patrick Wendell	2014-01-03	4	-54/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Closes #316
\| * \| \| \| \|	Restored the previously removed test	Prashant Sharma	2014-01-03	1	-1/+12
\| \| \| \| \| \|
\| * \| \| \| \|	fixed review comments	Prashant Sharma	2014-01-03	3	-5/+19
\| \| \| \| \| \|
\| * \| \| \| \|	Merge branch 'master' into spark-1002-remove-jars	Prashant Sharma	2014-01-03	92	-799/+1456
\| \|\\| \| \| \|
\| \| * \| \| \|	Merge pull request #320 from kayousterhout/erroneous_failed_msg	Reynold Xin	2014-01-02	2	-12/+15
\| \| \|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove erroneous FAILED state for killed tasks. Currently, when tasks are killed, the Executor first sends a status update for the task with a "KILLED" state, and then sends a second status update with a "FAILED" state saying that the task failed due to an exception. The second FAILED state is misleading/unncessary, and occurs due to a NonLocalReturnControl Exception that gets thrown due to the way we kill tasks. This commit eliminates that problem. I'm not at all sure that this is the best way to fix this problem, so alternate suggestions welcome. @rxin guessing you're the right person to look at this.
\| \| \| * \| \| \|	Remove erroneous FAILED state for killed tasks.	Kay Ousterhout	2014-01-02	2	-12/+15
\| \| \| \|/ / / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, when tasks are killed, the Executor first sends a status update for the task with a "KILLED" state, and then sends a second status update with a "FAILED" state saying that the task failed due to an exception. The second FAILED state is misleading/unncessary, and occurs due to a NonLocalReturnControl Exception that gets thrown due to the way we kill tasks. This commit eliminates that problem.
\| \| * \| \| \|	Merge pull request #297 from tdas/window-improvement	Patrick Wendell	2014-01-02	4	-157/+343
\| \| \|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Improvements to DStream window ops and refactoring of Spark's CheckpointSuite - Added a new RDD - PartitionerAwareUnionRDD. Using this RDD, one can take multiple RDDs partitioned by the same partitioner and unify them into a single RDD while preserving the partitioner. So m RDDs with p partitions each will be unified to a single RDD with p partitions and the same partitioner. The preferred location for each partition of the unified RDD will be the most common preferred location of the corresponding partitions of the parent RDDs. For example, location of partition 0 of the unified RDD will be where most of partition 0 of the parent RDDs are located. - Improved the performance of DStream's reduceByKeyAndWindow and groupByKeyAndWindow. Both these operations work by doing per-batch reduceByKey/groupByKey and then using PartitionerAwareUnionRDD to union the RDDs across the window. This eliminates a shuffle related to the window operation, which can reduce batch processing time by 30-40% for simple workloads. - Fixed bugs and simplified Spark's CheckpointSuite. Some of the tests were incorrect and unreliable. Added missing tests for ZippedRDD. I can go into greater detail if necessary. - Added mapSideCombine option to combineByKeyAndWindow.
\| \| \| * \| \| \|	Added Apache boilerplate and class docs to PartitionerAwareUnionRDD.	Tathagata Das	2013-12-26	1	-3/+33
\| \| \| \| \| \| \|
\| \| \| * \| \| \|	Merge branch 'apache-master' into window-improvement	Tathagata Das	2013-12-26	28	-1522/+975
\| \| \| \|\ \ \ \
\| \| \| * \ \ \ \	Merge branch 'master' into window-improvement	Tathagata Das	2013-12-26	30	-113/+395
\| \| \| \|\ \ \ \ \
\| \| \| * \| \| \| \| \|	Fixed bug in PartitionAwareUnionRDD	Tathagata Das	2013-12-26	1	-6/+9
\| \| \| \| \| \| \| \| \|
\| \| \| * \| \| \| \| \|	Added tests for PartitionerAwareUnionRDD in the CheckpointSuite. Refactored ↵	Tathagata Das	2013-12-20	3	-170/+231
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	CheckpointSuite to make the tests simpler and more reliable. Added missing test for ZippedRDD.
\| \| \| * \| \| \| \| \|	Merge branch 'scheduler-update' into window-improvement	Tathagata Das	2013-12-19	149	-1263/+2721
\| \| \| \|\ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: streaming/src/main/scala/org/apache/spark/streaming/dstream/WindowedDStream.scala
\| \| \| * \| \| \| \| \| \|	Added partitioner aware union, modified DStream.window.	Tathagata Das	2013-11-21	2	-0/+92
\| \| \| \| \| \| \| \| \| \|
\| \| * \| \| \| \| \| \| \|	Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/incubator-spark	Matei Zaharia	2014-01-02	2	-6/+1
\| \| \|\ \ \ \ \ \ \ \
\| \| \| * \| \| \| \| \| \| \|	Removed redundant TaskSetManager.error() function.	Kay Ousterhout	2014-01-02	2	-6/+1
\| \| \| \| \|_\|_\|_\|/ / / \| \| \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This function was leftover from a while ago, and now just passes all calls through to the abort() function, so this commit deletes it.
\| \| * \| \| \| \| \| \| \|	Merge pull request #311 from tmyklebu/master	Matei Zaharia	2014-01-02	3	-4/+38
\| \| \|\ \ \ \ \ \ \ \ \| \| \| \|/ / / / / / / \| \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SPARK-991: Report information gleaned from a Python stacktrace in the UI Scala: - Added setCallSite/clearCallSite to SparkContext and JavaSparkContext. These functions mutate a LocalProperty called "externalCallSite." - Add a wrapper, getCallSite, that checks for an externalCallSite and, if none is found, calls the usual Utils.formatSparkCallSite. - Change everything that calls Utils.formatSparkCallSite to call getCallSite instead. Except getCallSite. - Add wrappers to setCallSite/clearCallSite wrappers to JavaSparkContext. Python: - Add a gruesome hack to rdd.py that inspects the traceback and guesses what you want to see in the UI. - Add a RAII wrapper around said gruesome hack that calls setCallSite/clearCallSite as appropriate. - Wire said RAII wrapper up around three calls into the Scala code. I'm not sure that I hit all the spots with the RAII wrapper. I'm also not sure that my gruesome hack does exactly what we want. One could also approach this change by refactoring runJob/submitJob/runApproximateJob to take a call site, then threading that parameter through everything that needs to know it. One might object to the pointless-looking wrappers in JavaSparkContext. Unfortunately, I can't directly access the SparkContext from Python---or, if I can, I don't know how---so I need to wrap everything that matters in JavaSparkContext. Conflicts: core/src/main/scala/org/apache/spark/api/java/JavaSparkContext.scala
\| \| \| * \| \| \| \| \| \|	Factor call site reporting out to SparkContext.	Tor Myklebust	2013-12-28	3	-4/+38
\| \| \| \| \| \| \| \| \| \|
\| \| * \| \| \| \| \| \| \|	Fixed two uses of conf.get with no default value in Mesos	Matei Zaharia	2014-01-01	2	-2/+2
\| \| \| \| \| \| \| \| \| \|
\| \| * \| \| \| \| \| \| \|	Miscellaneous fixes from code review.	Matei Zaharia	2014-01-01	44	-174/+195
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Also replaced SparkConf.getOrElse with just a "get" that takes a default value, and added getInt, getLong, etc to make code that uses this simpler later on.
\| \| * \| \| \| \| \| \| \|	Merge remote-tracking branch 'apache/master' into conf2	Matei Zaharia	2014-01-01	11	-21/+45
\| \| \|\ \ \ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/SparkContext.scala core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala core/src/main/scala/org/apache/spark/storage/BlockManagerMasterActor.scala
\| \| * \ \ \ \ \ \ \ \	Merge remote-tracking branch 'apache/master' into conf2	Matei Zaharia	2014-01-01	10	-210/+450
\| \| \|\ \ \ \ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: project/SparkBuild.scala
\| \| * \ \ \ \ \ \ \ \ \	Merge remote-tracking branch 'apache/master' into conf2	Matei Zaharia	2013-12-31	19	-107/+120
\| \| \|\ \ \ \ \ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/rdd/CheckpointRDD.scala streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala
\| \| * \| \| \| \| \| \| \| \| \| \|	Updated docs for SparkConf and handled review comments	Matei Zaharia	2013-12-30	9	-32/+56
\| \| \| \| \| \| \| \| \| \| \| \| \|
\| \| * \| \| \| \| \| \| \| \| \| \|	Properly show Spark properties on web UI, and change app name property	Matei Zaharia	2013-12-29	4	-9/+12
\| \| \| \| \| \| \| \| \| \| \| \| \|
\| \| * \| \| \| \| \| \| \| \| \| \|	Added tests for SparkConf and fixed a bug	Matei Zaharia	2013-12-29	3	-0/+117
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Typesafe Config caches system properties the first time it's invoked by default, ignoring later changes unless you do something special