spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
...
\| * \| \| \| \|	Fixed conf/slaves and updated docs.	Tathagata Das	2014-01-10	2	-5/+15
\| \| \|/ / / \| \|/\| \| \|
\| * \| \| \|	Merge remote-tracking branch 'apache/master' into driver-test	Tathagata Das	2014-01-09	4	-5/+5
\| \|\ \ \ \ \| \| \|/ / / \| \|/\| \| / \| \| \| \|/ \| \| \|/\|
\| * \| \|	Fixed bugs in reading of checkpoints.	Tathagata Das	2014-01-10	2	-17/+20
\| \| \| \|
\| * \| \|	Merge branch 'standalone-driver' into driver-test	Tathagata Das	2014-01-09	28	-1219/+241
\| \|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/SparkContext.scala core/src/main/scala/org/apache/spark/deploy/worker/DriverRunner.scala examples/src/main/java/org/apache/spark/streaming/examples/JavaNetworkWordCount.java streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala
\| * \| \| \|	Changed the way StreamingContext finds and reads checkpoint files, and added ↵	Tathagata Das	2014-01-09	5	-109/+210
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	JavaStreamingContext.getOrCreate.
\| * \| \| \|	More bug fixes.	Tathagata Das	2014-01-08	1	-19/+26
\| \| \| \| \|
\| * \| \| \|	Modified checkpoing file clearing policy.	Tathagata Das	2014-01-08	7	-52/+104
\| \| \| \| \|
\| * \| \| \|	Added a hashmap to cache file mod times.	Tathagata Das	2014-01-05	1	-6/+24
\| \| \| \| \|
\| * \| \| \|	Merge branch 'filestream-fix' into driver-test	Tathagata Das	2014-01-06	7	-142/+219
\| \|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala
\| * \| \| \| \|	Bug fixes to the DriverRunner and minor changes here and there.	Tathagata Das	2014-01-06	2	-5/+5
\| \| \| \| \| \|
\| * \| \| \| \|	Added StreamingContext.getOrCreate to for automatic recovery, and added ↵	Tathagata Das	2014-01-02	2	-4/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	RecoverableNetworkWordCount example to use it.
* \| \| \| \| \|	Merge pull request #377 from andrewor14/master	Patrick Wendell	2014-01-10	1	-10/+19
\|\ \ \ \ \ \ \| \|_\|_\|_\|_\|/ \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	External Sorting for Aggregator and CoGroupedRDDs (Revisited) (This pull request is re-opened from https://github.com/apache/incubator-spark/pull/303, which was closed because Jenkins / github was misbehaving) The target issue for this patch is the out-of-memory exceptions triggered by aggregate operations such as reduce, groupBy, join, and cogroup. The existing AppendOnlyMap used by these operations resides purely in memory, and grows with the size of the input data until the amount of allocated memory is exceeded. Under large workloads, this problem is aggravated by the fact that OOM frequently occurs only after a very long (> 1 hour) map phase, in which case the entire job must be restarted. The solution is to spill the contents of this map to disk once a certain memory threshold is exceeded. This functionality is provided by ExternalAppendOnlyMap, which additionally sorts this buffer before writing it out to disk, and later merges these buffers back in sorted order. Under normal circumstances in which OOM is not triggered, ExternalAppendOnlyMap is simply a wrapper around AppendOnlyMap and incurs little overhead. Only when the memory usage is expected to exceed the given threshold does ExternalAppendOnlyMap spill to disk.
\| * \| \| \| \|	Fix wonky imports from merge	Andrew Or	2014-01-09	1	-8/+1
\| \| \| \| \| \|
\| * \| \| \| \|	Merge github.com:apache/incubator-spark	Andrew Or	2014-01-09	16	-1135/+106
\| \|\ \ \ \ \ \| \| \| \|_\|_\|/ \| \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/SparkEnv.scala streaming/src/test/java/org/apache/spark/streaming/JavaAPISuite.java
\| * \| \| \| \|	Merge remote-tracking branch 'spark/master'	Andrew Or	2014-01-02	21	-241/+378
\| \|\ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/rdd/CoGroupedRDD.scala
\| * \| \| \| \| \|	Address Aaron's comments	Andrew Or	2013-12-29	1	-4/+4
\| \| \| \| \| \| \|
\| * \| \| \| \| \|	Fix streaming JavaAPISuite again	Andrew Or	2013-12-26	1	-8/+12
\| \| \| \| \| \| \|
\| * \| \| \| \| \|	Fix streaming JavaAPISuite that depended on order	Aaron Davidson	2013-12-26	1	-11/+16
\| \| \|_\|/ / / \| \|/\| \| \| \|
* \| \| \| \| \|	Merge pull request #363 from pwendell/streaming-logs	Patrick Wendell	2014-01-09	2	-5/+5
\|\ \ \ \ \ \ \| \|_\|_\|/ / / \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Set default logging to WARN for Spark streaming examples. This programatically sets the log level to WARN by default for streaming tests. If the user has already specified a log4j.properties file, the user's file will take precedence over this default.
\| * \| \| \| \|	Set default logging to WARN for Spark streaming examples.	Patrick Wendell	2014-01-09	2	-5/+5
\| \| \|_\|_\|/ \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This programatically sets the log level to WARN by default for streaming tests. If the user has already specified a log4j.properties file, the user's file will take precedence over this default.
* / \| \| \|	Use typed getters for configuration settings	Matei Zaharia	2014-01-09	4	-5/+5
\|/ / / /
* \| \| \|	Merge remote-tracking branch 'apache/master' into project-refactor	Tathagata Das	2014-01-06	21	-234/+377
\|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: examples/src/main/java/org/apache/spark/streaming/examples/JavaFlumeEventCount.java streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala streaming/src/test/java/org/apache/spark/streaming/JavaAPISuite.java streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala streaming/src/test/scala/org/apache/spark/streaming/TestSuiteBase.scala
\| * \| \| \|	Removing SPARK_EXAMPLES_JAR in the code	Patrick Wendell	2014-01-05	2	-10/+21
\| \| \|/ / \| \|/\| \|
\| * \| \|	Merge pull request #297 from tdas/window-improvement	Patrick Wendell	2014-01-02	5	-15/+45
\| \|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Improvements to DStream window ops and refactoring of Spark's CheckpointSuite - Added a new RDD - PartitionerAwareUnionRDD. Using this RDD, one can take multiple RDDs partitioned by the same partitioner and unify them into a single RDD while preserving the partitioner. So m RDDs with p partitions each will be unified to a single RDD with p partitions and the same partitioner. The preferred location for each partition of the unified RDD will be the most common preferred location of the corresponding partitions of the parent RDDs. For example, location of partition 0 of the unified RDD will be where most of partition 0 of the parent RDDs are located. - Improved the performance of DStream's reduceByKeyAndWindow and groupByKeyAndWindow. Both these operations work by doing per-batch reduceByKey/groupByKey and then using PartitionerAwareUnionRDD to union the RDDs across the window. This eliminates a shuffle related to the window operation, which can reduce batch processing time by 30-40% for simple workloads. - Fixed bugs and simplified Spark's CheckpointSuite. Some of the tests were incorrect and unreliable. Added missing tests for ZippedRDD. I can go into greater detail if necessary. - Added mapSideCombine option to combineByKeyAndWindow.
\| \| * \| \|	Removed unncessary options from WindowedDStream.	Tathagata Das	2013-12-26	1	-5/+3
\| \| \| \| \|
\| \| * \| \|	Updated groupByKeyAndWindow to be computed incrementally, and added ↵	Tathagata Das	2013-12-26	5	-12/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	mapSideCombine to combineByKeyAndWindow.
\| \| * \| \|	Merge branch 'scheduler-update' into window-improvement	Tathagata Das	2013-12-23	4	-5/+32
\| \| \|\\| \|
\| \| * \| \|	Merge branch 'scheduler-update' into window-improvement	Tathagata Das	2013-12-19	56	-615/+1010
\| \| \|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: streaming/src/main/scala/org/apache/spark/streaming/dstream/WindowedDStream.scala
\| \| * \| \| \|	Added flag in window operation to use partition awaare union.	Tathagata Das	2013-11-21	1	-1/+3
\| \| \| \| \| \|
\| \| * \| \| \|	Added partitioner aware union, modified DStream.window.	Tathagata Das	2013-11-21	1	-39/+2
\| \| \| \| \| \|
\| \| * \| \| \|	Added partition aware union to improve reduceByKeyAndWindow	Tathagata Das	2013-11-20	1	-2/+49
\| \| \| \| \| \|
\| * \| \| \| \|	Miscellaneous fixes from code review.	Matei Zaharia	2014-01-01	3	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Also replaced SparkConf.getOrElse with just a "get" that takes a default value, and added getInt, getLong, etc to make code that uses this simpler later on.
\| * \| \| \| \|	Merge remote-tracking branch 'apache/master' into conf2	Matei Zaharia	2014-01-01	7	-14/+0
\| \|\ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/SparkContext.scala core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala core/src/main/scala/org/apache/spark/storage/BlockManagerMasterActor.scala
\| \| * \ \ \ \	Merge remote-tracking branch 'apache-github/master' into log4j-fix-2	Patrick Wendell	2014-01-01	7	-142/+220
\| \| \|\ \ \ \ \ \| \| \| \| \|_\|_\|/ \| \| \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala
\| \| * \| \| \| \|	Removing initLogging entirely	Patrick Wendell	2013-12-30	7	-13/+0
\| \| \| \|_\|_\|/ \| \| \|/\| \| \|
\| * \| \| \| \|	Fix two compile errors introduced in merge	Matei Zaharia	2013-12-31	2	-2/+2
\| \| \| \| \| \|
\| * \| \| \| \|	Merge remote-tracking branch 'apache/master' into conf2	Matei Zaharia	2013-12-31	7	-141/+221
\| \|\ \ \ \ \ \| \| \| \|/ / / \| \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/rdd/CheckpointRDD.scala streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala
\| \| * \| \| \|	Fixed comments and long lines based on comments on PR 289.	Tathagata Das	2013-12-31	3	-9/+17
\| \| \| \| \| \|
\| \| * \| \| \|	Minor changes in comments and strings to address comments in PR 289.	Tathagata Das	2013-12-27	1	-8/+6
\| \| \| \| \| \|
\| \| * \| \| \|	Added warning if filestream adds files with no data in them (file RDDs have ↵	Tathagata Das	2013-12-26	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	0 partitions).
\| \| * \| \| \|	Changed file stream to not catch any exceptions related to finding new files ↵	Tathagata Das	2013-12-26	1	-19/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(FileNotFound exception is still caught and ignored).
\| \| * \| \| \|	Removed slack time in file stream and added better handling of exceptions ↵	Tathagata Das	2013-12-26	3	-50/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	due to failures due FileNotFound exceptions.
\| \| * \| \| \|	Fixed Python API for sc.setCheckpointDir. Also other fixes based on ↵	Tathagata Das	2013-12-24	2	-8/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reynold's comments on PR 289.
\| \| * \| \| \|	Minor formatting fixes.	Tathagata Das	2013-12-23	3	-9/+13
\| \| \| \| \| \|
\| \| * \| \| \|	Updated testsuites to work with the slack time of file stream.	Tathagata Das	2013-12-23	3	-2/+22
\| \| \| \| \| \|
\| \| * \| \| \|	Merge branch 'scheduler-update' into filestream-fix	Tathagata Das	2013-12-23	3	-4/+26
\| \| \|\\| \| \|
\| \| * \| \| \|	Fixed bug in file stream that prevented some files from being read	Tathagata Das	2013-12-23	1	-9/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	correctly.
\| \| * \| \| \|	Updated CheckpointWriter and FileInputDStream to be robust against failed ↵	Tathagata Das	2013-12-22	3	-35/+78
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	FileSystem objects. Refactored JobGenerator to use actor so that all updating of DStream's metadata is single threaded.
\| \| * \| \| \|	Merge branch 'scheduler-update' into filestream-fix	Tathagata Das	2013-12-22	2	-1/+6
\| \| \|\ \ \ \
\| \| * \ \ \ \	Merge branch 'scheduler-update' into filestream-fix	Tathagata Das	2013-12-19	56	-611/+996
\| \| \|\ \ \ \ \ \| \| \| \| \|_\|_\|/ \| \| \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/rdd/CheckpointRDD.scala streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala