spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge remote-tracking branch 'apache/master' into project-refactor	Tathagata Das	2014-01-06	21	-234/+377
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: examples/src/main/java/org/apache/spark/streaming/examples/JavaFlumeEventCount.java streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala streaming/src/test/java/org/apache/spark/streaming/JavaAPISuite.java streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala streaming/src/test/scala/org/apache/spark/streaming/TestSuiteBase.scala
\| *	Removing SPARK_EXAMPLES_JAR in the code	Patrick Wendell	2014-01-05	2	-10/+21
\| \|
\| *	Merge pull request #297 from tdas/window-improvement	Patrick Wendell	2014-01-02	5	-15/+45
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Improvements to DStream window ops and refactoring of Spark's CheckpointSuite - Added a new RDD - PartitionerAwareUnionRDD. Using this RDD, one can take multiple RDDs partitioned by the same partitioner and unify them into a single RDD while preserving the partitioner. So m RDDs with p partitions each will be unified to a single RDD with p partitions and the same partitioner. The preferred location for each partition of the unified RDD will be the most common preferred location of the corresponding partitions of the parent RDDs. For example, location of partition 0 of the unified RDD will be where most of partition 0 of the parent RDDs are located. - Improved the performance of DStream's reduceByKeyAndWindow and groupByKeyAndWindow. Both these operations work by doing per-batch reduceByKey/groupByKey and then using PartitionerAwareUnionRDD to union the RDDs across the window. This eliminates a shuffle related to the window operation, which can reduce batch processing time by 30-40% for simple workloads. - Fixed bugs and simplified Spark's CheckpointSuite. Some of the tests were incorrect and unreliable. Added missing tests for ZippedRDD. I can go into greater detail if necessary. - Added mapSideCombine option to combineByKeyAndWindow.
\| \| *	Removed unncessary options from WindowedDStream.	Tathagata Das	2013-12-26	1	-5/+3
\| \| \|
\| \| *	Updated groupByKeyAndWindow to be computed incrementally, and added ↵	Tathagata Das	2013-12-26	5	-12/+34
\| \| \| \| \| \| \| \| \| \| \| \|	mapSideCombine to combineByKeyAndWindow.
\| \| *	Merge branch 'scheduler-update' into window-improvement	Tathagata Das	2013-12-23	4	-5/+32
\| \| \|\
\| \| * \	Merge branch 'scheduler-update' into window-improvement	Tathagata Das	2013-12-19	57	-632/+1024
\| \| \|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: streaming/src/main/scala/org/apache/spark/streaming/dstream/WindowedDStream.scala
\| \| * \| \|	Added flag in window operation to use partition awaare union.	Tathagata Das	2013-11-21	1	-1/+3
\| \| \| \| \|
\| \| * \| \|	Added partitioner aware union, modified DStream.window.	Tathagata Das	2013-11-21	1	-39/+2
\| \| \| \| \|
\| \| * \| \|	Added partition aware union to improve reduceByKeyAndWindow	Tathagata Das	2013-11-20	1	-2/+49
\| \| \| \| \|
\| * \| \| \|	Miscellaneous fixes from code review.	Matei Zaharia	2014-01-01	3	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Also replaced SparkConf.getOrElse with just a "get" that takes a default value, and added getInt, getLong, etc to make code that uses this simpler later on.
\| * \| \| \|	Merge remote-tracking branch 'apache/master' into conf2	Matei Zaharia	2014-01-01	7	-14/+0
\| \|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/SparkContext.scala core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala core/src/main/scala/org/apache/spark/storage/BlockManagerMasterActor.scala
\| \| * \ \ \	Merge remote-tracking branch 'apache-github/master' into log4j-fix-2	Patrick Wendell	2014-01-01	7	-142/+220
\| \| \|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala
\| \| * \| \| \| \|	Removing initLogging entirely	Patrick Wendell	2013-12-30	7	-13/+0
\| \| \| \|_\|_\|/ \| \| \|/\| \| \|
\| * \| \| \| \|	Fix two compile errors introduced in merge	Matei Zaharia	2013-12-31	2	-2/+2
\| \| \| \| \| \|
\| * \| \| \| \|	Merge remote-tracking branch 'apache/master' into conf2	Matei Zaharia	2013-12-31	7	-141/+221
\| \|\ \ \ \ \ \| \| \| \|/ / / \| \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/rdd/CheckpointRDD.scala streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala
\| \| * \| \| \|	Fixed comments and long lines based on comments on PR 289.	Tathagata Das	2013-12-31	3	-9/+17
\| \| \| \| \| \|
\| \| * \| \| \|	Minor changes in comments and strings to address comments in PR 289.	Tathagata Das	2013-12-27	1	-8/+6
\| \| \| \| \| \|
\| \| * \| \| \|	Added warning if filestream adds files with no data in them (file RDDs have ↵	Tathagata Das	2013-12-26	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	0 partitions).
\| \| * \| \| \|	Changed file stream to not catch any exceptions related to finding new files ↵	Tathagata Das	2013-12-26	1	-19/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(FileNotFound exception is still caught and ignored).
\| \| * \| \| \|	Removed slack time in file stream and added better handling of exceptions ↵	Tathagata Das	2013-12-26	3	-50/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	due to failures due FileNotFound exceptions.
\| \| * \| \| \|	Fixed Python API for sc.setCheckpointDir. Also other fixes based on ↵	Tathagata Das	2013-12-24	2	-8/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reynold's comments on PR 289.
\| \| * \| \| \|	Minor formatting fixes.	Tathagata Das	2013-12-23	3	-9/+13
\| \| \| \| \| \|
\| \| * \| \| \|	Updated testsuites to work with the slack time of file stream.	Tathagata Das	2013-12-23	3	-2/+22
\| \| \| \| \| \|
\| \| * \| \| \|	Merge branch 'scheduler-update' into filestream-fix	Tathagata Das	2013-12-23	3	-4/+26
\| \| \|\\| \| \|
\| \| * \| \| \|	Fixed bug in file stream that prevented some files from being read	Tathagata Das	2013-12-23	1	-9/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	correctly.
\| \| * \| \| \|	Updated CheckpointWriter and FileInputDStream to be robust against failed ↵	Tathagata Das	2013-12-22	3	-35/+78
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	FileSystem objects. Refactored JobGenerator to use actor so that all updating of DStream's metadata is single threaded.
\| \| * \| \| \|	Merge branch 'scheduler-update' into filestream-fix	Tathagata Das	2013-12-22	2	-1/+6
\| \| \|\ \ \ \
\| \| * \ \ \ \	Merge branch 'scheduler-update' into filestream-fix	Tathagata Das	2013-12-19	57	-620/+1009
\| \| \|\ \ \ \ \ \| \| \| \| \|_\|_\|/ \| \| \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/rdd/CheckpointRDD.scala streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala
\| \| * \| \| \| \|	Fixed multiple file stream and checkpointing bugs.	Tathagata Das	2013-12-11	5	-74/+117
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Made file stream more robust to transient failures. - Changed Spark.setCheckpointDir API to not have the second 'useExisting' parameter. Spark will always create a unique directory for checkpointing underneath the directory provide to the funtion. - Fixed bug wrt local relative paths as checkpoint directory. - Made DStream and RDD checkpointing use SparkContext.hadoopConfiguration, so that more HDFS compatible filesystems are supported for checkpointing.
\| * \| \| \| \| \|	Fix a few settings that were being read as system properties after merge	Matei Zaharia	2013-12-29	1	-2/+2
\| \| \| \| \| \| \|
\| * \| \| \| \| \|	Merge remote-tracking branch 'origin/master' into conf2	Matei Zaharia	2013-12-29	23	-203/+583
\| \|\ \ \ \ \ \ \| \| \| \|_\|_\|/ / \| \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/SparkContext.scala core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala core/src/main/scala/org/apache/spark/scheduler/local/LocalScheduler.scala core/src/main/scala/org/apache/spark/util/MetadataCleaner.scala core/src/test/scala/org/apache/spark/scheduler/TaskResultGetterSuite.scala core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala new-yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala streaming/src/test/scala/org/apache/spark/streaming/BasicOperationsSuite.scala streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala streaming/src/test/scala/org/apache/spark/streaming/TestSuiteBase.scala streaming/src/test/scala/org/apache/spark/streaming/WindowOperationsSuite.scala
\| * \| \| \| \| \|	Fix other failing tests	Matei Zaharia	2013-12-28	4	-18/+22
\| \| \| \| \| \| \|
\| * \| \| \| \| \|	Add a StreamingContext constructor that takes a conf object	Matei Zaharia	2013-12-28	1	-1/+20
\| \| \| \| \| \| \|
\| * \| \| \| \| \|	Fix CheckpointSuite test failures	Matei Zaharia	2013-12-28	1	-4/+3
\| \| \| \| \| \| \|
\| * \| \| \| \| \|	Fix test failures due to setting / clearing clock type in Streaming	Matei Zaharia	2013-12-28	4	-10/+14
\| \| \| \| \| \| \|
\| * \| \| \| \| \|	Various fixes to configuration code	Matei Zaharia	2013-12-28	8	-37/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Got rid of global SparkContext.globalConf - Pass SparkConf to serializers and compression codecs - Made SparkConf public instead of private[spark] - Improved API of SparkContext and SparkConf - Switched executor environment vars to be passed through SparkConf - Fixed some places that were still using system properties - Fixed some tests, though others are still failing This still fails several tests in core, repl and streaming, likely due to properties not being set or cleared correctly (some of the tests run fine in isolation).
\| * \| \| \| \| \|	spark-544, introducing SparkConf and related configuration overhaul.	Prashant Sharma	2013-12-25	7	-25/+27
\| \| \| \| \| \| \|
* \| \| \| \| \| \|	Removed extra empty lines.	Tathagata Das	2013-12-31	1	-1/+0
\| \| \| \| \| \| \|
* \| \| \| \| \| \|	Removed unnecessary comments.	Tathagata Das	2013-12-31	3	-55/+8
\| \| \| \| \| \| \|
* \| \| \| \| \| \|	Added pom.xml for external projects and removed unnecessary dependencies and ↵	Tathagata Das	2013-12-31	1	-54/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	repositoris from other poms and sbt.
* \| \| \| \| \| \|	Refactored kafka, flume, zeromq, mqtt as separate external projects, with ↵	Tathagata Das	2013-12-30	12	-978/+81
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	their own self-contained scala API, java API, scala unit tests and java unit tests. Updated examples to use the external projects.
* \| \| \| \| \| \|	Refactored streaming project to separate out the twitter functionality.	Tathagata Das	2013-12-26	3	-105/+8
\| \|/ / / / / \|/\| \| \| \| \|
* \| \| \| \| \|	Minor change for PR 277.	Tathagata Das	2013-12-23	1	-1/+1
\| \| \| \| \| \|
* \| \| \| \| \|	Minor formatting fixes.	Tathagata Das	2013-12-23	1	-5/+4
\| \| \| \| \| \|
* \| \| \| \| \|	Added comments to BatchInfo and JobSet, based on Patrick's comment on PR 277.	Tathagata Das	2013-12-23	2	-3/+26
\| \|_\|_\|/ / \|/\| \| \| \|
* \| \| \| \|	Minor updated based on comments on PR 277.	Tathagata Das	2013-12-20	2	-1/+6
\| \|_\|/ / \|/\| \| \|
* \| \| \|	Minor changes.	Tathagata Das	2013-12-18	9	-32/+36
\| \| \| \|
* \| \| \|	Merge branch 'apache-master' into scheduler-update	Tathagata Das	2013-12-18	42	-406/+458
\|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala streaming/src/main/scala/org/apache/spark/streaming/dstream/ForEachDStream.scala
\| * \| \|	Use scala.binary.version in POMs	Mark Hamstra	2013-12-15	1	-7/+7
\| \| \| \|