spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	Show full stack trace and time taken in unit tests.	Reynold Xin	2013-12-23	1	-1/+4
\|
*	Merge pull request #244 from leftnoteasy/master	Reynold Xin	2013-12-23	7	-5/+257
\|\ \| \| \| \| \| \| \| \| \| \|	Added SPARK-968 implementation for review Added SPARK-968 implementation for review
\| *	SPARK-968, added executor address showing in aggregated metrics by executors ↵	wangda.tan	2013-12-23	1	-0/+13
\| \| \| \| \| \| \| \|	table
\| *	added changes according to comments from rxin	wangda.tan	2013-12-22	7	-44/+24
\| \|
\| *	spark-968, changes for avoid a NPE	wangda.tan	2013-12-17	2	-25/+29
\| \|
\| *	spark-898, changes according to review comments	wangda.tan	2013-12-17	7	-76/+90
\| \|
\| *	Merge branch 'master' of git://github.com/apache/incubator-spark	wangda.tan	2013-12-16	208	-2913/+3601
\| \|\
\| * \|	SPARK-968, added sc finalize code to avoid akka rebinding to the same port	wangda.tan	2013-12-09	1	-0/+7
\| \| \|
\| * \|	Merge branch 'master' of https://github.com/leftnoteasy/incubator-spark-1	wangda.tan	2013-12-09	56	-615/+4457
\| \|\ \
\| * \| \|	SPARK-968, In stage UI, add an overview section that shows task stats ↵	wangda.tan	2013-12-09	5	-0/+234
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	grouped by executor id
* \| \| \|	Merge pull request #280 from aarondav/minor	Patrick Wendell	2013-12-20	3	-17/+8
\|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Minor cleanup for standalone scheduler See commit messages
\| * \| \| \|	Fix compiler warning in SparkZooKeeperSession	Aaron Davidson	2013-12-19	1	-0/+1
\| \| \| \| \|
\| * \| \| \|	Remove firstApp from the standalone scheduler Master	Aaron Davidson	2013-12-19	1	-10/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As a lonely child with no one to care for it... we had to put it down.
\| * \| \| \|	Extraordinarily minor code/comment cleanup	Aaron Davidson	2013-12-19	2	-7/+7
\| \| \| \| \|
* \| \| \| \|	Merge pull request #272 from tmyklebu/master	Patrick Wendell	2013-12-19	5	-16/+36
\|\ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Track and report task result serialisation time. - DirectTaskResult now has a ByteBuffer valueBytes instead of a T value. - DirectTaskResult now has a member function T value() that deserialises valueBytes. - Executor serialises value into a ByteBuffer and passes it to DTR's ctor. - Executor tracks the time taken to do so and puts it in a new field in TaskMetrics. - StagePage now reports serialisation time from TaskMetrics along with the other things it reported.
\| * \| \| \| \|	Add a serialisation time column to the StagePage.	Tor Myklebust	2013-12-18	1	-1/+5
\| \| \| \| \| \|
\| * \| \| \| \|	objectSer -> valueSer in a test.	Tor Myklebust	2013-12-17	1	-2/+2
\| \| \| \| \| \|
\| * \| \| \| \|	Missed a spot; had an objectSer here too.	Tor Myklebust	2013-12-17	1	-2/+2
\| \| \| \| \| \|
\| * \| \| \| \|	Merge branch 'master' of git://github.com/apache/incubator-spark	Tor Myklebust	2013-12-16	4	-7/+7
\| \|\ \ \ \ \
\| * \| \| \| \| \|	Incorporate pwendell's code review suggestions.	Tor Myklebust	2013-12-16	4	-9/+8
\| \| \| \| \| \| \|
\| * \| \| \| \| \|	UI to display serialisation time of a stage.	Tor Myklebust	2013-12-16	1	-0/+6
\| \| \| \| \| \| \|
\| * \| \| \| \| \|	Track task value serialisation time in TaskMetrics.	Tor Myklebust	2013-12-16	4	-15/+26
\| \| \|_\|_\|_\|/ \| \|/\| \| \| \|
* \| \| \| \| \|	Merge pull request #276 from shivaram/collectPartition	Reynold Xin	2013-12-19	5	-8/+50
\|\ \ \ \ \ \ \| \|_\|_\|/ / / \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add collectPartition to JavaRDD interface. This interface is useful for implementing `take` from other language frontends where the data is serialized. Also remove `takePartition` from PythonRDD and use `collectPartition` in rdd.py. Thanks @concretevitamin for the original change and tests.
\| * \| \| \| \|	Add comment explaining collectPartitions's use	Shivaram Venkataraman	2013-12-19	1	-0/+2
\| \| \| \| \| \|
\| * \| \| \| \|	Make collectPartitions take an array of partitions	Shivaram Venkataraman	2013-12-19	3	-12/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change the implementation to use runJob instead of PartitionPruningRDD. Also update the unit tests and the python take implementation to use the new interface.
\| * \| \| \| \|	Add collectPartition to JavaRDD interface.	Shivaram Venkataraman	2013-12-18	5	-9/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Also remove takePartition from PythonRDD and use collectPartition in rdd.py.
* \| \| \| \| \|	Merge pull request #278 from MLnick/java-python-tostring	Matei Zaharia	2013-12-19	2	-0/+5
\|\ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add toString to Java RDD, and __repr__ to Python RDD Addresses [SPARK-992](https://spark-project.atlassian.net/browse/SPARK-992)
\| * \| \| \| \| \|	Add toString to Java RDD, and __repr__ to Python RDD	Nick Pentreath	2013-12-19	2	-0/+5
\|/ / / / / /
* \| \| \| \| \|	Merge pull request #183 from aarondav/spark-959	Reynold Xin	2013-12-19	1	-0/+2
\|\ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	[SPARK-959] Explicitly depend on org.eclipse.jetty.orbit jar Without this, in some cases, Ivy attempts to download the wrong file and fails, stopping the whole build. See [bug](https://spark-project.atlassian.net/browse/SPARK-959) for more details. Note that this may not be the best solution, as I do not understand the root cause of why this only happens for some people. However, it is reported to work.
\| * \| \| \| \| \|	[SPARK-959] Explicitly depend on org.eclipse.jetty.orbit jar	Aaron Davidson	2013-12-18	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Without this, in some cases, Ivy attempts to download the wrong file and fails, stopping the whole build. See bug for more details. (This is probably also the beginning of the slow death of our recently prettified dependencies. Form follow function.)
* \| \| \| \| \| \|	Merge pull request #247 from aarondav/minor	Reynold Xin	2013-12-18	13	-71/+48
\|\ \ \ \ \ \ \ \| \|/ / / / / / \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Increase spark.akka.askTimeout default to 30 seconds In experimental clusters we've observed that a 10 second timeout was insufficient, despite having a low number of nodes and relatively small workload (16 nodes, <1.5 TB data). This would cause an entire job to fail at the beginning of the reduce phase. There is no particular reason for this value to be small as a timeout should only occur in an exceptional situation. Also centralized the reading of spark.akka.askTimeout to AkkaUtils (surely this can later be cleaned up to use Typesafe). Finally, deleted some lurking implicits. If anyone can think of a reason they should still be there, please let me know.
\| * \| \| \| \| \|	In experimental clusters we've observed that a 10 second timeout was ↵	Aaron Davidson	2013-12-18	13	-71/+48
\|/ / / / / / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	insufficient, despite having a low number of nodes and relatively small workload (16 nodes, <1.5 TB data). This would cause an entire job to fail at the beginning of the reduce phase. There is no particular reason for this value to be small as a timeout should only occur in an exceptional situation. Also centralized the reading of spark.akka.askTimeout to AkkaUtils (surely this can later be cleaned up to use Typesafe). Finally, deleted some lurking implicits. If anyone can think of a reason they should still be there, please let me know.
* \| \| \| \| \|	Merge pull request #267 from JoshRosen/cygwin	Reynold Xin	2013-12-18	4	-5/+55
\|\ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix Cygwin support in several scripts. This allows the spark-shell, spark-class, run-example, make-distribution.sh, and ./bin/start-* scripts to work under Cygwin. Note that this doesn't support PySpark under Cygwin, since that requires many additional `cygpath` calls from within Python and will be non-trivial to implement. This PR was inspired by, and subsumes, #253 (so close #253 after this is merged).
\| * \| \| \| \| \|	Fix Cygwin support in several scripts.	Josh Rosen	2013-12-15	4	-5/+55
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This allows the spark-shell, spark-class, run-example, make-distribution.sh, and ./bin/start-* scripts to work under Cygwin. Note that this doesn't support PySpark under Cygwin, since that requires many additional `cygpath` calls from within Python and will be non-trivial to implement. This PR was inspired by, and subsumes, #253 (so close #253 after this is merged).
* \| \| \| \| \| \|	Merge pull request #274 from azuryy/master	Reynold Xin	2013-12-18	1	-1/+1
\|\ \ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixed the example link in the Scala programing guid. The old link cannot access, I changed to the new one.
\| * \| \| \| \| \| \|	changed the example links in the scala-programming-guid	fengdong	2013-12-18	1	-1/+1
\| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \|	Fixed the example link.	fengdong	2013-12-18	1	-1/+1
\| \| \|/ / / / / \| \|/\| \| \| \| \|
* \| \| \| \| \| \|	Merge pull request #273 from rxin/top	Reynold Xin	2013-12-17	1	-0/+2
\|\ \ \ \ \ \ \ \| \|/ / / / / / \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixed a performance problem in RDD.top and BoundedPriorityQueue BoundedPriority was actually traversing the entire queue to calculate the size, resulting in bad performance in insertion. This should also cherry pick cleanly into branch-0.8.
\| * \| \| \| \| \|	Fixed a performance problem in RDD.top and BoundedPriorityQueue (size in ↵	Reynold Xin	2013-12-17	1	-0/+2
\|/ / / / / / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	BoundedPriority was actually traversing the entire queue to calculate the size, resulting in bad performance in insertion).
* \| \| \| \| \|	Merge pull request #268 from pwendell/shaded-protobuf	Patrick Wendell	2013-12-16	7	-82/+64
\|\ \ \ \ \ \ \| \|_\|_\|/ / / \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add support for 2.2. to master (via shaded jars) This patch does a few related things. NOTE: This may not compile correctly for ~24 hours until artifacts fully propagate to Maven Central. 1. Uses shaded versions of akka/protobuf. For more information on how these versions were prepared, see [1]. 2. Brings the `new-yarn` project up-to-date with the changes for Akka 2.2.3. 3. Some clean-up of the build now that we don't have to switch akka groups for different YARN versions. [1] https://github.com/pwendell/spark-utils/tree/933a309ef85c22643e8e4b5e365652101c4e95de/shaded-protobuf
\| * \| \| \| \|	One other fix	Patrick Wendell	2013-12-16	1	-1/+1
\| \| \| \| \| \|
\| * \| \| \| \|	Clean-up	Patrick Wendell	2013-12-16	2	-1/+2
\| \| \| \| \| \|
\| * \| \| \| \|	Cleanup	Patrick Wendell	2013-12-16	2	-7/+0
\| \| \| \| \| \|
\| * \| \| \| \|	Removing extra code in new yarn	Patrick Wendell	2013-12-16	1	-1/+0
\| \| \| \| \| \|
\| * \| \| \| \|	Remove trailing slashes from repository specifications.	Patrick Wendell	2013-12-16	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The correct format is to not have a trailing slash. For me this caused non-deterministic failures due to issues fetching certain artifacts. The issue was that some of the maven caches would fail to fetch the artifact (due to the way that the artifact path was concatenated with the repository) and this short-circuited the download process in a silent way. Here is what the log output looked like: Downloading: http://repo.maven.apache.org/maven2/org/spark-project/akka/akka-remote_2.10/2.2.3-shaded-protobuf/akka-remote_2.10-2.2.3-shaded-protobuf.pom [WARNING] The POM for org.spark-project.akka:akka-remote_2.10:jar:2.2.3-shaded-protobuf is missing, no dependency information available This was pretty brutal to debug since there was no error message anywhere and the path looks correct as reported by the Maven log.
\| * \| \| \| \|	Attempt with extra repositories	Patrick Wendell	2013-12-16	7	-76/+65
\|/ / / / /
* \| \| \| \|	Merge pull request #270 from ewencp/really-force-ssh-pseudo-tty-master	Patrick Wendell	2013-12-16	1	-2/+2
\|\ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Force pseudo-tty allocation in spark-ec2 script. ssh commands need the -t argument repeated twice if there is no local tty, e.g. if the process running spark-ec2 uses nohup and the parent process exits. Without this change, if you run the script this way (e.g. using nohup from a cron job), it will fail setting up the nodes because some of the ssh commands complain about missing ttys and then fail. (This version is for the master branch. I've filed a separate request for the 0.8 since changes to the script caused the patches to be different.)
\| * \| \| \| \|	Force pseudo-tty allocation in spark-ec2 script.	Ewen Cheslack-Postava	2013-12-16	1	-2/+2
\| \| \|/ / / \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ssh commands need the -t argument repeated twice if there is no local tty, e.g. if the process running spark-ec2 uses nohup and the parent process exits.
* \| \| \| \|	Merge pull request #245 from gregakespret/task-maxfailures-fix	Reynold Xin	2013-12-16	3	-5/+5
\|\ \ \ \ \ \| \|/ / / / \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix for spark.task.maxFailures not enforced correctly. Docs at http://spark.incubator.apache.org/docs/latest/configuration.html say: ``` spark.task.maxFailures Number of individual task failures before giving up on the job. Should be greater than or equal to 1. Number of allowed retries = this value - 1. ``` Previous implementation worked incorrectly. When for example `spark.task.maxFailures` was set to 1, the job was aborted only after the second task failure, not after the first one.
\| * \| \| \|	Fix tests.	Grega Kespret	2013-12-10	2	-2/+2
\| \| \| \| \|