spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
...
\| * \| \| \| \| \| \|	Merge remote-tracking branch 'apache/master' into driver-test	Tathagata Das	2014-01-09	55	-186/+282
\| \|\ \ \ \ \ \ \ \| \| \|/ / / / / / \| \|/\| \| \| / / / \| \| \| \|_\|/ / / \| \| \|/\| \| \| \|
\| * \| \| \| \| \|	Fixed bugs in reading of checkpoints.	Tathagata Das	2014-01-10	2	-17/+20
\| \| \| \| \| \| \|
\| * \| \| \| \| \|	Merge branch 'standalone-driver' into driver-test	Tathagata Das	2014-01-09	371	-4402/+8039
\| \|\ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/SparkContext.scala core/src/main/scala/org/apache/spark/deploy/worker/DriverRunner.scala examples/src/main/java/org/apache/spark/streaming/examples/JavaNetworkWordCount.java streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala
\| * \| \| \| \| \| \|	Changed the way StreamingContext finds and reads checkpoint files, and added ↵	Tathagata Das	2014-01-09	10	-125/+254
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	JavaStreamingContext.getOrCreate.
\| * \| \| \| \| \| \|	More bug fixes.	Tathagata Das	2014-01-08	1	-19/+26
\| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \|	Modified checkpoing file clearing policy.	Tathagata Das	2014-01-08	7	-52/+104
\| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \|	Added a hashmap to cache file mod times.	Tathagata Das	2014-01-05	2	-8/+30
\| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \|	Merge branch 'filestream-fix' into driver-test	Tathagata Das	2014-01-06	14	-196/+267
\| \|\ \ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala
\| * \| \| \| \| \| \| \|	Bug fixes to the DriverRunner and minor changes here and there.	Tathagata Das	2014-01-06	4	-11/+14
\| \| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \| \|	Removed the exponential backoff for testing.	Tathagata Das	2014-01-04	1	-1/+1
\| \| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \| \|	Added StreamingContext.getOrCreate to for automatic recovery, and added ↵	Tathagata Das	2014-01-02	4	-5/+85
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	RecoverableNetworkWordCount example to use it.
* \| \| \| \| \| \| \| \|	Merge pull request #377 from andrewor14/master	Patrick Wendell	2014-01-10	17	-93/+1118
\|\ \ \ \ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	External Sorting for Aggregator and CoGroupedRDDs (Revisited) (This pull request is re-opened from https://github.com/apache/incubator-spark/pull/303, which was closed because Jenkins / github was misbehaving) The target issue for this patch is the out-of-memory exceptions triggered by aggregate operations such as reduce, groupBy, join, and cogroup. The existing AppendOnlyMap used by these operations resides purely in memory, and grows with the size of the input data until the amount of allocated memory is exceeded. Under large workloads, this problem is aggravated by the fact that OOM frequently occurs only after a very long (> 1 hour) map phase, in which case the entire job must be restarted. The solution is to spill the contents of this map to disk once a certain memory threshold is exceeded. This functionality is provided by ExternalAppendOnlyMap, which additionally sorts this buffer before writing it out to disk, and later merges these buffers back in sorted order. Under normal circumstances in which OOM is not triggered, ExternalAppendOnlyMap is simply a wrapper around AppendOnlyMap and incurs little overhead. Only when the memory usage is expected to exceed the given threshold does ExternalAppendOnlyMap spill to disk.
\| * \| \| \| \| \| \| \| \|	Update documentation for externalSorting	Andrew Or	2014-01-10	1	-3/+2
\| \| \| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \| \| \|	Address Patrick's and Reynold's comments	Andrew Or	2014-01-10	5	-47/+73
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Aside from trivial formatting changes, use nulls instead of Options for DiskMapIterator, and add documentation for spark.shuffle.externalSorting and spark.shuffle.memoryFraction. Also, set spark.shuffle.memoryFraction to 0.3, and spark.storage.memoryFraction = 0.6.
\| * \| \| \| \| \| \| \| \|	Fix wonky imports from merge	Andrew Or	2014-01-09	1	-8/+1
\| \| \| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \| \| \|	Defensively allocate memory from global pool	Andrew Or	2014-01-09	5	-47/+80
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is an alternative to the existing approach, which evenly distributes the collective shuffle memory among all running tasks. In the new approach, each thread requests a chunk of memory whenever its map is about to multiplicatively grow. If there is sufficient memory in the global pool, the thread allocates it and grows its map. Otherwise, it spills. A danger with the previous approach is that a new task may quickly fill up its map before old tasks finish spilling, potentially causing an OOM. This approach prevents this scenario as it favors existing tasks over new tasks; any thread that may step over the boundary of other threads defensively backs off and starts spilling. Testing through spark-perf reveals: (1) When no spills have occured, the performance of external sorting using this memory management approach is essentially the same as without external sorting. (2) When one or more spills have occured, the performance of external sorting is a small multiple (3x) worse
\| * \| \| \| \| \| \| \| \|	Merge github.com:apache/incubator-spark	Andrew Or	2014-01-09	293	-2974/+5557
\| \|\ \ \ \ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/SparkEnv.scala streaming/src/test/java/org/apache/spark/streaming/JavaAPISuite.java
\| * \| \| \| \| \| \| \| \| \|	Get SparkConf from SparkEnv, rather than creating new ones	Andrew Or	2014-01-07	3	-6/+6
\| \| \| \| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \| \| \| \|	Use AtomicInteger for numRunningTasks	Andrew Or	2014-01-04	1	-12/+7
\| \| \| \| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \| \| \| \|	Address Mark's comments	Andrew Or	2014-01-04	3	-18/+13
\| \| \| \| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \| \| \| \|	Assign spill threshold as a fraction of maximum memory	Andrew Or	2014-01-04	5	-33/+81
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Further, divide this threshold by the number of tasks running concurrently. Note that this does not guard against the following scenario: a new task quickly fills up its share of the memory before old tasks finish spilling their contents, in which case the total memory used by such maps may exceed what was specified. Currently, spark.shuffle.safetyFraction mitigates the effect of this.
\| * \| \| \| \| \| \| \| \| \|	Remove unnecessary ClassTag's	Andrew Or	2014-01-03	2	-7/+4
\| \| \| \| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \| \| \| \|	Refactor using SparkConf	Andrew Or	2014-01-03	4	-19/+21
\| \| \| \| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \| \| \| \|	Merge remote-tracking branch 'spark/master'	Andrew Or	2014-01-02	182	-1764/+3172
\| \|\ \ \ \ \ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/rdd/CoGroupedRDD.scala
\| * \| \| \| \| \| \| \| \| \| \|	TempBlockId takes UUID and is explicitly non-serializable	Aaron Davidson	2014-01-02	2	-5/+6
\| \| \| \| \| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \| \| \| \| \|	Simplify ExternalAppendOnlyMap on the assumption that the mergeCombiners ↵	Andrew Or	2014-01-01	3	-135/+53
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	function is specified
\| * \| \| \| \| \| \| \| \| \| \|	Merge branch 'master' of github.com:andrewor14/incubator-spark	Andrew Or	2013-12-31	4	-9/+9
\| \|\ \ \ \ \ \ \ \ \ \ \
\| \| * \| \| \| \| \| \| \| \| \| \|	Rename IntermediateBlockId to TempBlockId	Aaron Davidson	2013-12-31	4	-9/+9
\| \| \| \| \| \| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \| \| \| \| \| \|	Address Patrick's and Reynold's comments	Andrew Or	2013-12-31	1	-49/+71
\| \|/ / / / / / / / / / /
\| * \| \| \| \| \| \| \| \| \| \|	Merge branch 'master' of github.com:andrewor14/incubator-spark	Andrew Or	2013-12-31	3	-97/+71
\| \|\ \ \ \ \ \ \ \ \ \ \
\| \| * \| \| \| \| \| \| \| \| \| \|	Add new line at end of file	Aaron Davidson	2013-12-30	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|
\| \| * \| \| \| \| \| \| \| \| \| \|	Refactor SamplingSizeTracker into SizeTrackingAppendOnlyMap	Aaron Davidson	2013-12-30	3	-97/+71
\| \| \| \| \| \| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \| \| \| \| \| \|	Add support and test for null keys in ExternalAppendOnlyMap	Andrew Or	2013-12-31	4	-32/+139
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Also add safeguard against use of destructively sorted AppendOnlyMap
\| * \| \| \| \| \| \| \| \| \| \| \|	Add warning message for spilling	Andrew Or	2013-12-31	1	-5/+9
\| \| \| \| \| \| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \| \| \| \| \| \|	Address Aaron's and Jerry's comments	Andrew Or	2013-12-31	2	-5/+6
\| \|/ / / / / / / / / / /
\| * \| \| \| \| \| \| \| \| \| \|	Fix CheckpointSuite test fail	Andrew Or	2013-12-30	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \| \| \| \| \|	Simplify merge logic based on the invariant that all spills contain unique keys	Andrew Or	2013-12-30	1	-37/+22
\| \| \| \| \| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \| \| \| \| \|	Merge pull request from aarondav: Utilize DiskBlockManager pathway for temp ↵	Andrew Or	2013-12-30	3	-16/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	file writing This gives us a couple advantages: - Uses spark.local.dir and randomly selects a directory/disk. - Ensure files are deleted on normal DiskBlockManager cleanup. - Availability of same stats as usual DiskBlockObjectWriter (currenty unused). Also enable basic cleanup when iterator is fully drained. Still requires cleanup for operations that fail or don't go through all elements.
\| * \| \| \| \| \| \| \| \| \| \|	Merge branch 'master' of github.com:andrewor14/incubator-spark	Andrew Or	2013-12-29	2	-9/+11
\| \|\ \ \ \ \ \ \ \ \ \ \
\| \| * \| \| \| \| \| \| \| \| \| \|	Use Comparator instead of Ordering	Aaron Davidson	2013-12-29	2	-9/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	lower object creation costs
\| * \| \| \| \| \| \| \| \| \| \| \|	Add test suite for ExternalAppendOnlyMap	Andrew Or	2013-12-29	1	-0/+217
\| \| \| \| \| \| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \| \| \| \| \| \|	Make serializer a parameter to ExternalAppendOnlyMap	Andrew Or	2013-12-29	2	-4/+4
\| \|/ / / / / / / / / / /
\| * \| \| \| \| \| \| \| \| \| \|	Address Aaron's comments	Andrew Or	2013-12-29	5	-88/+188
\| \| \| \| \| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \| \| \| \| \|	Add Apache headers	Aaron Davidson	2013-12-27	3	-4/+54
\| \| \| \| \| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \| \| \| \| \|	Rename spark.shuffle.buffer variables	Andrew Or	2013-12-27	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \| \| \| \| \|	Final cleanup	Andrew Or	2013-12-26	4	-25/+28
\| \| \| \| \| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \| \| \| \| \|	Use real serializer & manual ordering	Aaron Davidson	2013-12-26	1	-11/+27
\| \| \| \| \| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \| \| \| \| \|	Return efficient iterator if no spillage happened	Aaron Davidson	2013-12-26	3	-9/+20
\| \| \| \| \| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \| \| \| \| \|	Sort AppendOnlyMap in-place	Andrew Or	2013-12-26	2	-14/+51
\| \| \| \| \| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \| \| \| \| \|	Fix streaming JavaAPISuite again	Andrew Or	2013-12-26	1	-8/+12
\| \| \| \| \| \| \| \| \| \| \| \|