spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	Remove commented code from Analytics	Ankur Dave	2014-01-10	1	-430/+0
\|
*	Update graphx/pom.xml to mirror mllib/pom.xml	Ankur Dave	2014-01-10	1	-69/+7
\|
*	Merge pull request #1 from jegonzal/graphx	Ankur Dave	2014-01-10	12	-159/+134
\|\ \| \| \| \|	ProgrammingGuide
\| *	WIP. Updating figures and cleaning up initial skeleton for GraphX ↵	Joseph E. Gonzalez	2014-01-10	12	-159/+134
\| \| \| \| \| \| \| \|	Programming guide.
* \|	Undo 8b6b8ac87f6ffb92b3395344bf2696d5c7fb3798	Ankur Dave	2014-01-10	1	-7/+3
\| \| \| \| \| \| \| \|	Getting unpersist right in GraphLab is tricky.
* \|	graph -> graphx in log4j.properties	Ankur Dave	2014-01-10	1	-1/+1
\| \|
* \|	Avoid recomputation by caching all multiply-used RDDs	Ankur Dave	2014-01-10	11	-53/+67
\| \|
* \|	Unpersist previous iterations in GraphLab	Ankur Dave	2014-01-10	1	-6/+10
\| \|
* \|	Add Graph.unpersistVertices()	Ankur Dave	2014-01-09	3	-8/+18
\| \|
* \|	Unpersist previous iterations in Pregel	Ankur Dave	2014-01-09	6	-7/+41
\| \|
* \|	graph -> graphx in bin/compute-classpath.sh	Ankur Dave	2014-01-09	1	-2/+2
\| \|
* \|	Add implicit algorithm methods for Graph; remove standalone PageRank	Ankur Dave	2014-01-09	10	-85/+99
\| \|
* \|	graph -> graphx	Ankur Dave	2014-01-09	50	-111/+111
\| \|
* \|	Svdpp -> SVDPlusPlus	Ankur Dave	2014-01-09	2	-11/+11
\| \|
* \|	Pid -> PartitionID	Ankur Dave	2014-01-09	8	-35/+36
\| \|
* \|	Vid -> VertexID	Ankur Dave	2014-01-09	31	-221/+234
\| \|
* \|	Unwrap Graph.mapEdges signature	Ankur Dave	2014-01-09	1	-3/+1
\| \|
* \|	Revert changes to examples/.../PageRankUtils.scala	Ankur Dave	2014-01-09	1	-3/+3
\| \| \| \| \| \| \| \|	Reverts to 04d83fc37f9eef89c20331c85291a0a169f75e6d:examples/src/main/scala/org/apache/spark/examples/bagel/PageRankUtils.scala.
* \|	Make GraphImpl serializable to work around capture	Ankur Dave	2014-01-09	1	-1/+1
\|/
*	Start fixing formatting of graphx-programming-guide	Ankur Dave	2014-01-09	1	-7/+6
\|
*	Add docs/graphx-programming-guide.md from ↵	Ankur Dave	2014-01-09	1	-0/+197
\| \| \| \|	7210257ba3038d5e22d4b60fe9c3113dc45c3dff:README.md
*	Removed Kryo dependency and graphx-shell	Ankur Dave	2014-01-09	7	-131/+8
\|
*	Remove GraphX README	Ankur Dave	2014-01-08	1	-131/+53
\|
*	Fix AbstractMethodError by inlining zip{Edge,Vertex}Partitions	Ankur Dave	2014-01-08	3	-49/+35
\| \| \| \| \| \| \| \|	The zip{Edge,Vertex}Partitions methods created doubly-nested closures and passed them to zipPartitions. For some reason this caused an AbstractMethodError when zipPartitions tried to invoke the closure. This commit works around the problem by inlining these methods wherever they are called, eliminating the doubly-nested closure.
*	Take SparkConf in constructor of Serializer subclasses	Ankur Dave	2014-01-08	2	-19/+26
\|
*	Manifest -> Tag in variable names	Ankur Dave	2014-01-08	3	-15/+15
\|
*	ClassManifest -> ClassTag	Ankur Dave	2014-01-08	19	-111/+129
\|
*	Fix mis-merge in 44fd30d3fbcf830deecbe8ea3e8ea165e74e6edd	Ankur Dave	2014-01-08	1	-0/+5
\|
*	Merge remote-tracking branch 'spark-upstream/master' into HEAD	Ankur Dave	2014-01-08	496	-8303/+17474
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: README.md core/src/main/scala/org/apache/spark/util/collection/OpenHashMap.scala core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala core/src/main/scala/org/apache/spark/util/collection/PrimitiveKeyOpenHashMap.scala pom.xml project/SparkBuild.scala repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala
\| *	Merge pull request #360 from witgo/master	Reynold Xin	2014-01-08	1	-1/+1
\| \|\ \| \| \| \| \| \| \| \| \|	fix make-distribution.sh show version: command not found
\| \| *	fix make-distribution.sh show version: command not found	liguoqiang	2014-01-09	1	-1/+1
\| \| \|
\| * \|	Merge pull request #357 from hsaputra/set_boolean_paramname	Reynold Xin	2014-01-08	2	-3/+4
\| \|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Set boolean param name for call to SparkHadoopMapReduceUtil.newTaskAttemptID Set boolean param name for call to SparkHadoopMapReduceUtil.newTaskAttemptID to make it clear which param being set.
\| \| * \|	Resolve PR review over 100 chars	Henry Saputra	2014-01-08	1	-1/+2
\| \| \| \|
\| \| * \|	Set boolean param name for two files call to ↵	Henry Saputra	2014-01-07	2	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SparkHadoopMapReduceUtil.newTaskAttemptID to make it clear which param being set.
\| * \| \|	Merge pull request #358 from pwendell/add-cdh	Patrick Wendell	2014-01-08	1	-0/+5
\| \|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add CDH Repository to Maven Build At some point this was removed from the Maven build... so I'm adding it back. It's needed for the Hadoop2 tests we run on Jenkins and it's also included in the SBT build.
\| \| * \| \|	Add CDH Repository to Maven Build	Patrick Wendell	2014-01-08	1	-0/+5
\| \| \| \| \|
\| * \| \| \|	Merge pull request #356 from hsaputra/remove_deprecated_cleanup_method	Reynold Xin	2014-01-08	2	-6/+0
\| \|\ \ \ \ \| \| \|_\|_\|/ \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove calls to deprecated mapred's OutputCommitter.cleanupJob Since Hadoop 1.0.4 the mapred OutputCommitter.commitJob should do cleanup job via call to OutputCommitter.cleanupJob, Remove SparkHadoopWriter.cleanup since it is used only by PairRDDFunctions. In fact the implementation of mapred OutputCommitter.commitJob looks like this: public void commitJob(JobContext jobContext) throws IOException { cleanupJob(jobContext); }
\| \| * \| \|	Remove calls to deprecated mapred's OutputCommitter.cleanupJob because since ↵	Henry Saputra	2014-01-07	2	-6/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Hadoop 1.0.4 the mapred OutputCommitter.commitJob should do cleanup job. In fact the implementation of mapred OutputCommitter.commitJob looks like this: public void commitJob(JobContext jobContext) throws IOException { cleanupJob(jobContext); } (The jobContext input argument is type of org.apache.hadoop.mapred.JobContext)
\| * \| \| \|	Merge pull request #345 from colorant/yarn	Thomas Graves	2014-01-08	4	-3/+7
\| \|\ \ \ \ \| \| \|_\|/ / \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	support distributing extra files to worker for yarn client mode So that user doesn't need to package all dependency into one assemble jar as spark app jar
\| \| * \| \|	Export --file for YarnClient mode to support sending extra files to worker ↵	Raymond Liu	2014-01-07	2	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	on yarn cluster
\| \| * \| \|	Minor typo fix for yarn client	Raymond Liu	2014-01-07	2	-2/+2
\| \| \| \| \|
\| * \| \| \|	Merge pull request #322 from falaki/MLLibDocumentationImprovement	Patrick Wendell	2014-01-07	1	-56/+274
\| \|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SPARK-1009 Updated MLlib docs to show how to use it in Python In addition added detailed examples for regression, clustering and recommendation algorithms in a separate Scala section. Fixed a few minor issues with existing documentation.
\| \| * \ \ \	Fixed merge conflict	Hossein Falaki	2014-01-07	306	-3327/+4228
\| \| \|\ \ \ \
\| \| * \| \| \| \|	Added proper evaluation example for collaborative filtering and fixed typo	Hossein Falaki	2014-01-06	1	-4/+8
\| \| \| \| \| \| \|
\| \| * \| \| \| \|	Added table of contents and minor fixes	Hossein Falaki	2014-01-03	1	-8/+16
\| \| \| \| \| \| \|
\| \| * \| \| \| \|	Commented the last part of collaborative filtering examples that lead to errors	Hossein Falaki	2014-01-02	1	-5/+6
\| \| \| \| \| \| \|
\| \| * \| \| \| \|	Added Scala and Python examples for mllib	Hossein Falaki	2014-01-02	1	-52/+261
\| \| \| \| \| \| \|
\| * \| \| \| \| \|	Merge pull request #355 from ScrapCodes/patch-1	Patrick Wendell	2014-01-07	1	-1/+1
\| \|\ \ \ \ \ \ \| \| \|_\|_\|_\|_\|/ \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Update README.md The link does not work otherwise.
\| \| * \| \| \| \|	Update README.md	Prashant Sharma	2014-01-08	1	-1/+1
\| \| \| \|_\|_\|/ \| \| \|/\| \| \| \| \| \| \| \| \|	The link does not work otherwise.
\| * \| \| \| \|	Merge pull request #313 from tdas/project-refactor	Patrick Wendell	2014-01-07	51	-739/+1907
\| \|\ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Refactored the streaming project to separate external libraries like Twitter, Kafka, Flume, etc. At a high level, these are the following changes. 1. All the external code was put in `SPARK_HOME/external/` as separate SBT projects and Maven modules. Their artifact names are `spark-streaming-twitter`, `spark-streaming-kafka`, etc. Both SparkBuild.scala and pom.xml files have been updated. References to external libraries and repositories have been removed from the settings of root and streaming projects/modules. 2. To avail the external functionality (say, creating a Twitter stream), the developer has to `import org.apache.spark.streaming.twitter._` . For Scala API, the developer has to call `TwitterUtils.createStream(streamingContext, ...)`. For the Java API, the developer has to call `TwitterUtils.createStream(javaStreamingContext, ...)`. 3. Each external project has its own scala and java unit tests. Note the unit tests of each external library use classes of the streaming unit tests (`TestSuiteBase`, `LocalJavaStreamingContext`, etc.). To enable this code sharing among test classes, `dependsOn(streaming % "compile->compile,test->test")` was used in the SparkBuild.scala . In the streaming/pom.xml, an additional `maven-jar-plugin` was necessary to capture this dependency (see comment inside the pom.xml for more information). 4. Jars of the external projects have been added to examples project but not to the assembly project. 5. In some files, imports have been rearrange to conform to the Spark coding guidelines.