spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	Fixed some whitespace	Matei Zaharia	2010-10-16	3	-14/+14
\|
*	Added support for generic Hadoop InputFormats and refactored textFile to	Matei Zaharia	2010-10-16	2	-28/+111
\| \| \| \|	use this. Closes #12.
*	Renamed HdfsFile to HadoopFile	Matei Zaharia	2010-10-16	2	-8/+9
\|
*	Simplified UnionRDD slightly and added a SparkContext.union method for ↵	Matei Zaharia	2010-10-16	2	-28/+22
\| \| \| \|	efficiently union-ing a large number of RDDs
*	Removed setSparkHome method on SparkContext in favor of having an	Matei Zaharia	2010-10-16	2	-16/+7
\| \| \| \| \|	optional constructor parameter, so that the scheduler is guaranteed that a Spark home has been set when it first builds its executor arg.
*	Added the ability to specify a list of JAR files when creating a	Matei Zaharia	2010-10-16	6	-116/+244
\| \| \| \|	SparkContext and have the master node serve those to workers.
*	Keep track of tasks in each job so that they can be removed when the job exits	Matei Zaharia	2010-10-16	1	-6/+12
\|
*	Further clarified some code	Matei Zaharia	2010-10-16	2	-10/+22
\|
*	Fixed some log messages	Matei Zaharia	2010-10-16	1	-2/+2
\|
*	Bug fixes and improvements for MesosScheduler and SimpleJob	Matei Zaharia	2010-10-16	3	-25/+46
\|
*	Moved Spark home detection to SparkContext and added a setSparkHome	Matei Zaharia	2010-10-16	2	-51/+81
\| \| \| \|	method for setting it programatically.
*	Bug fix in passing env vars to executors	Matei Zaharia	2010-10-16	1	-1/+1
\|
*	Added code so that Spark jobs can be launched from outside the Spark	Matei Zaharia	2010-10-15	1	-2/+29
\| \| \| \| \| \|	directory by setting SPARK_HOME and locating the executor relative to that. Entries on SPARK_CLASSPATH and SPARK_LIBRARY_PATH are also passed along to worker nodes.
*	Moved ClassServer out of repl packaged and renamed it to HttpServer.	Matei Zaharia	2010-10-15	2	-12/+12
\|
*	Abort jobs if a task fails more than a limited number of times	Matei Zaharia	2010-10-15	3	-23/+44
\|
*	A couple of improvements to ReplSuite:	Matei Zaharia	2010-10-15	1	-26/+30
\| \| \| \| \|	- Use collect instead of toArray - Disable the "running on Mesos" test when MESOS_HOME is not set
*	Made locality scheduling constant-time and added support for changing	Matei Zaharia	2010-10-15	1	-24/+79
\| \| \| \|	CPU and memory requested per task.
*	Moved Job and SimpleJob to new files	Matei Zaharia	2010-10-07	3	-183/+206
\|
*	Merge branch 'master' into matei-scheduling	Matei Zaharia	2010-10-07	4	-11/+23
\|\
\| *	Added a getId method to split to force classes to specify a unique ID	Matei Zaharia	2010-10-07	4	-11/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	for each split. This replaces the previous method of calling split.toString, which would produce different results for the same split each time it is deserialized (because the default implementation returns the Java object's address).
* \|	Merge branch 'master' into matei-scheduling	Matei Zaharia	2010-10-07	4	-10/+21
\|\\|
\| *	got rid of unnecessary line	Justin Ma	2010-10-07	1	-1/+0
\| \|
\| *	Merge branch 'master' into jtma-accumulator	Justin Ma	2010-10-07	13	-124/+372
\| \|\
\| \| *	Added toString() methods to UnionSplit, SeededSplit and CartesianSplit to	Justin Ma	2010-10-07	1	-2/+11
\| \| \| \| \| \| \| \| \| \| \| \|	ensure that the proper keys will be generated when they cached.
\| * \|	changes to accumulator to add objects in-place.	Justin Ma	2010-09-25	4	-8/+11
\| \| \|
* \| \|	Merge branch 'master' into matei-scheduling	Matei Zaharia	2010-10-05	3	-3/+64
\|\ \ \ \| \| \|/ \| \|/\|
\| * \|	Added splitWords function in Utils	Matei Zaharia	2010-10-04	1	-1/+26
\| \| \|
\| * \|	Added reduceByKey operation for RDDs containing pairsalpha-0.1	Matei Zaharia	2010-10-03	2	-2/+38
\| \| \|
* \| \|	Merge branch 'master' into matei-scheduling	Matei Zaharia	2010-10-03	2	-0/+2
\|\\| \|
\| * \|	Fixed a rather bad bug in HDFS files that has been in for a while:	root	2010-10-03	2	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	caching was not working because Split objects did not have a consistent toString value
* \| \|	Renamed ParallelOperation to Job	Matei Zaharia	2010-10-03	1	-42/+42
\|/ /
* \|	Merge branch 'matei-logging'	Matei Zaharia	2010-09-29	11	-100/+169
\|\ \
\| * \|	Made task-finished log messages slightly nicer	Matei Zaharia	2010-09-29	1	-6/+8
\| \| \|
\| * \|	A couple of minor fixes:	Matei Zaharia	2010-09-29	2	-9/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Don't include trailing $'s in class names of Scala objects - Report errors using logError instead of printStackTrace
\| * \|	Changed printlns to log statements and fixed a bug in run that was causing ↵	Matei Zaharia	2010-09-28	10	-93/+109
\| \| \| \| \| \| \| \| \| \| \| \|	it to fail on a Mesos cluster
\| * \|	Added Logging trait	Matei Zaharia	2010-09-28	1	-0/+44
\| \| \|
* \| \|	Increase default locality wait to 3s. Fixes #20.	Matei Zaharia	2010-09-29	1	-1/+1
\|/ /
* \|	Merge branch 'http-repl-class-serving'	Matei Zaharia	2010-09-28	4	-24/+131
\|\ \
\| * \|	More work on HTTP class loading	Matei Zaharia	2010-09-28	3	-24/+57
\| \| \|
\| * \|	Modified the interpreter to serve classes to the executors using a Jetty	Matei Zaharia	2010-09-28	1	-0/+74
\| \| \| \| \| \| \| \| \| \| \| \|	HTTP server instead of a shared (NFS) file system.
* \| \|	fixed typo in printing which task is already finished	Justin Ma	2010-09-28	1	-1/+1
\| \|/ \|/\|
* \|	Let's use future instead of actors	Justin Ma	2010-09-13	2	-38/+24
\| \|
* \|	Added fork()/join() operations for SparkContext, as well as corresponding ↵	Justin Ma	2010-09-12	2	-49/+91
\| \| \| \| \| \| \| \|	changes to MesosScheduler to support multiple ParallelOperations.
* \|	round robin scheduling of tasks has been added	Justin Ma	2010-09-07	3	-13/+25
\| \|
* \|	now adding the Split object.	Justin Ma	2010-09-01	1	-0/+3
\| \|
* \|	- Got rid of 'Split' type parameter in RDD	Justin Ma	2010-08-31	7	-59/+104
\| \| \| \| \| \| \| \| \| \| \| \|	- Added SampledRDD, SplitRDD and CartesianRDD - Made Split a class rather than a type parameter - Added numCores() to Scheduler to help set default level of parallelism
* \|	now we have sampling with replacement (at least on a per-split basis)	Justin Ma	2010-08-18	1	-3/+17
\| \|
* \|	HdfsFile.scala: added a try/catch block to exit gracefully for correupted ↵	Justin Ma	2010-08-18	3	-2/+35
\|/ \| \| \| \| \| \| \|	gzip files MesosScheduler.scala: formatted the slaveOffer() output to include the serialized task size RDD.scala: added support for aggregating RDDs on a per-split basis (aggregateSplit()) as well as for sampling without replacement (sample())
*	Modified Scala interpreter to have it avoid computing string versions of	Matei Zaharia	2010-08-15	1	-1/+3
\| \| \| \| \| \|	all results when :silent is enabled, so that it is easier to work with large arrays in Spark. (The string version of an array of numbers might not fit in memory even though the array itself does.)
*	Bug fix from Justin	Matei Zaharia	2010-08-13	1	-1/+1
\|