spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge remote-tracking branch 'origin/master' into yarn-2.2	Harvey Feng	2013-11-26	35	-218/+1466
\|\ \| \| \| \| \| \| \| \|	Conflicts: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
\| *	Merge pull request #209 from pwendell/better-docs	Reynold Xin	2013-11-26	1	-10/+13
\| \|\ \| \| \| \| \| \| \| \| \|	Improve docs for shuffle instrumentation
\| \| *	Improve docs for shuffle instrumentation	Patrick Wendell	2013-11-25	1	-10/+13
\| \| \|
\| * \|	Merge pull request #86 from holdenk/master	Matei Zaharia	2013-11-26	4	-0/+451
\| \|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add histogram functionality to DoubleRDDFunctions This pull request add histogram functionality to the DoubleRDDFunctions.
\| \| * \|	Fix the test	Holden Karau	2013-11-25	2	-5/+5
\| \| \| \|
\| \| * \|	Add spaces	Holden Karau	2013-11-18	1	-0/+14
\| \| \| \|
\| \| * \|	Remove explicit boxing	Holden Karau	2013-11-18	1	-2/+2
\| \| \| \|
\| \| * \|	Remove extranious type declerations	Holden Karau	2013-10-21	1	-2/+2
\| \| \| \|
\| \| * \|	Remove extranious type definitions from inside of tests	Holden Karau	2013-10-21	1	-86/+86
\| \| \| \|
\| \| * \|	CR feedback	Holden Karau	2013-10-21	3	-101/+125
\| \| \| \|
\| \| * \|	Add tests for the Java implementation.	Holden Karau	2013-10-20	1	-0/+14
\| \| \| \|
\| \| * \|	Initial commit of adding histogram functionality to the DoubleRDDFunctions.	Holden Karau	2013-10-19	3	-0/+399
\| \| \| \|
\| * \| \|	Merge pull request #204 from rxin/hash	Matei Zaharia	2013-11-25	4	-54/+103
\| \|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	OpenHashSet fixes Incorporated ideas from pull request #200. - Use Murmur Hash 3 finalization step to scramble the bits of HashCode instead of the simpler version in java.util.HashMap; the latter one had trouble with ranges of consecutive integers. Murmur Hash 3 is used by fastutil. - Don't check keys for equality when re-inserting due to growing the table; the keys will already be unique. - Remember the grow threshold instead of recomputing it on each insert Also added unit tests for size estimation for specialized hash sets and maps.
\| \| * \| \|	Incorporated ideas from pull request #200.	Reynold Xin	2013-11-25	1	-50/+57
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Use Murmur Hash 3 finalization step to scramble the bits of HashCode instead of the simpler version in java.util.HashMap; the latter one had trouble with ranges of consecutive integers. Murmur Hash 3 is used by fastutil. - Don't check keys for equality when re-inserting due to growing the table; the keys will already be unique - Remember the grow threshold instead of recomputing it on each insert
\| \| * \| \|	Added unit tests for size estimation for specialized hash sets and maps.	Reynold Xin	2013-11-25	3	-4/+46
\| \| \| \|/ \| \| \|/\|
\| * \| \|	Merge pull request #206 from ash211/patch-2	Matei Zaharia	2013-11-25	1	-1/+2
\| \|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Update tuning.md Clarify when serializer is used based on recent user@ mailing list discussion.
\| \| * \| \|	Update tuning.md	Andrew Ash	2013-11-25	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Clarify when serializer is used based on recent user@ mailing list discussion.
\| * \| \| \|	Merge pull request #201 from rxin/mappartitions	Matei Zaharia	2013-11-25	4	-70/+22
\| \|\ \ \ \ \| \| \|/ / / \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use the proper partition index in mapPartitionsWIthIndex mapPartitionsWithIndex uses TaskContext.partitionId as the partition index. TaskContext.partitionId used to be identical to the partition index in a RDD. However, pull request #186 introduced a scenario (with partition pruning) that the two can be different. This pull request uses the right partition index in all mapPartitionsWithIndex related calls. Also removed the extra MapPartitionsWIthContextRDD and put all the mapPartitions related functionality in MapPartitionsRDD.
\| \| * \| \|	Consolidated both mapPartitions related RDDs into a single MapPartitionsRDD.	Reynold Xin	2013-11-24	4	-70/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Also changed the semantics of the index parameter in mapPartitionsWithIndex from the partition index of the output partition to the partition index in the current RDD.
\| * \| \| \|	Merge pull request #101 from colorant/yarn-client-scheduler	Matei Zaharia	2013-11-25	7	-23/+484
\| \|\ \ \ \ \| \| \|_\|/ / \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For SPARK-527, Support spark-shell when running on YARN sync to trunk and resubmit here In current YARN mode approaching, the application is run in the Application Master as a user program thus the whole spark context is on remote. This approaching won't support application that involve local interaction and need to be run on where it is launched. So In this pull request I have a YarnClientClusterScheduler and backend added. With this scheduler, the user application is launched locally,While the executor will be launched by YARN on remote nodes with a thin AM which only launch the executor and monitor the Driver Actor status, so that when client app is done, it can finish the YARN Application as well. This enables spark-shell to run upon YARN. This also enable other Spark applications to have the spark context to run locally with a master-url "yarn-client". Thus e.g. SparkPi could have the result output locally on console instead of output in the log of the remote machine where AM is running on. Docs also updated to show how to use this yarn-client mode.
\| \| * \| \|	Add YarnClientClusterScheduler and Backend.	Raymond Liu	2013-11-22	7	-23/+484
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With this scheduler, the user application is launched locally, While the executor will be launched by YARN on remote nodes. This enables spark-shell to run upon YARN.
\| * \| \| \|	Merge pull request #203 from witgo/master	Reynold Xin	2013-11-25	1	-0/+5
\| \|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix Maven build for metrics-graphite
\| \| * \| \| \|	Fix Maven build for metrics-graphite	LiGuoqiang	2013-11-25	1	-0/+5
\| \|/ / / /
\| * \| \| \|	Merge pull request #151 from russellcardullo/add-graphite-sink	Matei Zaharia	2013-11-24	5	-0/+96
\| \|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add graphite sink for metrics This adds a metrics sink for graphite. The sink must be configured with the host and port of a graphite node and optionally may be configured with a prefix that will be prepended to all metrics that are sent to graphite.
\| \| * \| \| \|	Cleanup GraphiteSink.scala based on feedback	Russell Cardullo	2013-11-18	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Reorder imports according to the style guide * Consistently use propertyToOption in all places
\| \| * \| \| \|	Add graphite sink for metrics	Russell Cardullo	2013-11-08	5	-0/+96
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds a metrics sink for graphite. The sink must be configured with the host and port of a graphite node and optionally may be configured with a prefix that will be prepended to all metrics that are sent to graphite.
\| * \| \| \| \|	Merge pull request #185 from mkolod/random-number-generator	Matei Zaharia	2013-11-24	4	-5/+200
\| \|\ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	XORShift RNG with unit tests and benchmark This patch was introduced to address SPARK-950 - the discussion below the ticket explains not only the rationale, but also the design and testing decisions: https://spark-project.atlassian.net/browse/SPARK-950 To run unit test, start SBT console and type: compile test-only org.apache.spark.util.XORShiftRandomSuite To run benchmark, type: project core console Once the Scala console starts, type: org.apache.spark.util.XORShiftRandom.benchmark(100000000) XORShiftRandom is also an object with a main method taking the number of iterations as an argument, so you can also run it from the command line.
\| \| * \| \| \| \|	Make XORShiftRandom explicit in KMeans and roll it back for RDD	Marek Kolodziej	2013-11-20	2	-5/+7
\| \| \| \| \| \| \|
\| \| * \| \| \| \|	Formatting and scoping (private[spark]) updates	Marek Kolodziej	2013-11-19	2	-3/+3
\| \| \| \| \| \| \|
\| \| * \| \| \| \|	Updates to reflect pull request code review	Marek Kolodziej	2013-11-18	5	-48/+69
\| \| \| \| \| \| \|
\| \| * \| \| \| \|	XORShift RNG with unit tests and benchmark	Marek Kolodziej	2013-11-18	5	-3/+175
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To run unit test, start SBT console and type: compile test-only org.apache.spark.util.XORShiftRandomSuite To run benchmark, type: project core console Once the Scala console starts, type: org.apache.spark.util.XORShiftRandom.benchmark(100000000)
\| * \| \| \| \| \|	Merge pull request #197 from aarondav/patrick-fix	Reynold Xin	2013-11-25	1	-3/+6
\| \|\ \ \ \ \ \ \| \| \|_\|_\|_\|/ / \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix 'timeWriting' stat for shuffle files Due to concurrent git branches, changes from shuffle file consolidation patch caused the shuffle write timing patch to no longer actually measure the time, since it requires time be measured after the stream has been closed.
\| \| * \| \| \| \|	Fix 'timeWriting' stat for shuffle files	Aaron Davidson	2013-11-21	1	-3/+6
\| \| \|/ / / / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Due to concurrent git branches, changes from shuffle file consolidation patch caused the shuffle write timing patch to no longer actually measure the time, since it requires time be measured after the stream has been closed.
\| * \| \| \| \|	Merge pull request #200 from mateiz/hash-fix	Reynold Xin	2013-11-24	1	-43/+50
\| \|\ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	AppendOnlyMap fixes - Chose a more random reshuffling step for values returned by Object.hashCode to avoid some long chaining that was happening for consecutive integers (e.g. `sc.makeRDD(1 to 100000000, 100).map(t => (t, t)).reduceByKey(_ + _).count`) - Some other small optimizations throughout (see commit comments)
\| \| * \| \| \| \|	Some other optimizations to AppendOnlyMap:	Matei Zaharia	2013-11-23	1	-37/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Don't check keys for equality when re-inserting due to growing the table; the keys will already be unique - Remember the grow threshold instead of recomputing it on each insert
\| \| * \| \| \| \|	Fixes to AppendOnlyMap:	Matei Zaharia	2013-11-23	1	-7/+6
\| \|/ / / / / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Use Murmur Hash 3 finalization step to scramble the bits of HashCode instead of the simpler version in java.util.HashMap; the latter one had trouble with ranges of consecutive integers. Murmur Hash 3 is used by fastutil. - Use Object.equals() instead of Scala's == to compare keys, because the latter does extra casts for numeric types (see the equals method in https://github.com/scala/scala/blob/master/src/library/scala/runtime/BoxesRunTime.java)
\| * \| \| \| \|	Merge pull request #198 from ankurdave/zipPartitions-preservesPartitioning	Reynold Xin	2013-11-23	2	-10/+32
\| \|\ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Support preservesPartitioning in RDD.zipPartitions In `RDD.zipPartitions`, add support for a `preservesPartitioning` option (similar to `RDD.mapPartitions`) that reuses the first RDD's partitioner.
\| \| * \| \| \| \|	Support preservesPartitioning in RDD.zipPartitions	Ankur Dave	2013-11-23	2	-10/+32
\| \|/ / / / /
\| * \| \| \| \|	Merge pull request #193 from aoiwelle/patch-1	Reynold Xin	2013-11-22	1	-1/+1
\| \|\ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix Kryo Serializer buffer documentation inconsistency The documentation here is inconsistent with the coded default and other documentation.
\| \| * \| \| \| \|	Fix Kryo Serializer buffer inconsistency	Neal Wiggins	2013-11-20	1	-1/+1
\| \| \| \|_\|/ / \| \| \|/\| \| \| \| \| \| \| \| \|	The documentation here is inconsistent with the coded default and other documentation.
\| * \| \| \| \|	Merge pull request #196 from pwendell/master	Reynold Xin	2013-11-22	1	-0/+2
\| \|\ \ \ \ \ \| \| \|/ / / / \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	TimeTrackingOutputStream should pass on calls to close() and flush(). Without this fix you get a huge number of open files when running shuffles.
\| \| * \| \| \|	TimeTrackingOutputStream should pass on calls to close() and flush().	Patrick Wendell	2013-11-21	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Without this fix you get a huge number of open shuffles after running shuffles.
* \| \| \| \| \|	Add optional Hadoop 2.2 settings in sbt build.	Harvey Feng	2013-11-26	1	-9/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the Hadoop used is version 2.2 or derived from it, then Spark will be compiled against protobuf-2.5 and a protobuf-2.5 version of Akka 2.0.5.
* \| \| \| \| \|	Hadoop 2.2 YARN API migration for `SPARK_HOME/new-yarn`	Harvey Feng	2013-11-23	6	-489/+468
\| \| \| \| \| \|
* \| \| \| \| \|	Add a "new-yarn" directory in SPARK_HOME, intended to contain Hadoop-2.2 API ↵	Harvey Feng	2013-11-23	11	-0/+2822
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	changes.
* \| \| \| \| \|	A few more style fixes in `yarn` package.	Harvey Feng	2013-11-23	3	-45/+71
\| \| \| \| \| \|
* \| \| \| \| \|	Merge branch 'master' into yarn-cleanup	Harvey Feng	2013-11-21	30	-161/+241
\|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala yarn/src/main/scala/org/apache/spark/deploy/yarn/WorkerRunnable.scala yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala
\| * \| \| \| \|	Merge branch 'master' of github.com:tbfenet/incubator-spark	Reynold Xin	2013-11-21	3	-48/+91
\| \|\ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	PartitionPruningRDD is using index from parent I was getting a ArrayIndexOutOfBoundsException exception after doing union on pruned RDD. The index it was using on the partition was the index in the original RDD not the new pruned RDD.
\| \| * \| \| \| \|	PartitionPruningRDD is using index from parent(review changes)	Matthew Taylor	2013-11-19	2	-13/+6
\| \| \| \| \| \| \|
\| \| * \| \| \| \|	PartitionPruningRDD is using index from parent	Matthew Taylor	2013-11-19	2	-13/+63
\| \| \| \|/ / / \| \| \|/\| \| \|