spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	Initial weights in Scala are ones; do that too. Also fix some errors.	Tor Myklebust	2013-12-25	1	-6/+6
\|
*	Split the mllib bindings into a whole bunch of modules and rename some things.	Tor Myklebust	2013-12-25	7	-183/+409
\|
*	Remove useless line from test stub.	Tor Myklebust	2013-12-24	1	-1/+0
\|
*	Python change for move of PythonMLLibAPI.	Tor Myklebust	2013-12-24	1	-1/+1
\|
*	Release JVM reference to the ALSModel when done.	Tor Myklebust	2013-12-22	1	-2/+2
\|
*	Python stubs for ALSModel.	Tor Myklebust	2013-12-21	2	-8/+56
\|
*	Un-semicolon mllib.py.	Tor Myklebust	2013-12-20	1	-11/+11
\|
*	Change some docstrings and add some others.	Tor Myklebust	2013-12-20	1	-1/+3
\|
*	Licence notice.	Tor Myklebust	2013-12-20	1	-0/+17
\|
*	Whitespace.	Tor Myklebust	2013-12-20	1	-1/+1
\|
*	Remove gigantic endian-specific test and exception tests.	Tor Myklebust	2013-12-20	1	-38/+3
\|
*	Tests for the Python side of the mllib bindings.	Tor Myklebust	2013-12-20	1	-52/+172
\|
*	Python stubs for classification and clustering.	Tor Myklebust	2013-12-20	2	-16/+96
\|
*	Python side of python bindings for linear, Lasso, and ridge regression	Tor Myklebust	2013-12-19	2	-15/+72
\|
*	Incorporate most of Josh's style suggestions. I don't want to deal with the ↵	Tor Myklebust	2013-12-19	2	-98/+91
\| \| \| \|	type and length checking errors until we've got at least one working stub that we're all happy with.
*	The rest of the Python side of those bindings.	Tor Myklebust	2013-12-19	3	-2/+4
\|
*	First cut at python mllib bindings. Only LinearRegression is supported.	Tor Myklebust	2013-12-19	1	-0/+114
\|
*	Merge branch 'master' into akka-bug-fix	Prashant Sharma	2013-12-11	3	-1/+36
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/pom.xml core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala pom.xml project/SparkBuild.scala streaming/pom.xml yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala
\| *	License headers	Patrick Wendell	2013-12-09	1	-0/+17
\| \|
\| *	Fix UnicodeEncodeError in PySpark saveAsTextFile().	Josh Rosen	2013-11-28	2	-1/+19
\| \| \| \| \| \|	Fixes SPARK-970.
* \|	Merge branch 'master' into wip-scala-2.10	Prashant Sharma	2013-11-27	8	-142/+383
\|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala core/src/main/scala/org/apache/spark/rdd/MapPartitionsRDD.scala core/src/main/scala/org/apache/spark/rdd/MapPartitionsWithContextRDD.scala core/src/main/scala/org/apache/spark/rdd/RDD.scala python/pyspark/rdd.py
\| *	Removed unused basestring case from dump_stream.	Josh Rosen	2013-11-26	1	-2/+0
\| \|
\| *	FramedSerializer: _dumps => dumps, _loads => loads.	Josh Rosen	2013-11-10	4	-18/+18
\| \|
\| *	Send PySpark commands as bytes insetad of strings.	Josh Rosen	2013-11-10	3	-16/+13
\| \|
\| *	Add custom serializer support to PySpark.	Josh Rosen	2013-11-10	8	-148/+362
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For now, this only adds MarshalSerializer, but it lays the groundwork for other supporting custom serializers. Many of these mechanisms can also be used to support deserialization of different data formats sent by Java, such as data encoded by MsgPack. This also fixes a bug in SparkContext.union().
\| *	Remove Pickle-wrapping of Java objects in PySpark.	Josh Rosen	2013-11-03	4	-14/+39
\| \| \| \| \| \| \| \| \| \| \| \|	If we support custom serializers, the Python worker will know what type of input to expect, so we won't need to wrap Tuple2 and Strings into pickled tuples and strings.
\| *	Replace magic lengths with constants in PySpark.	Josh Rosen	2013-11-03	2	-6/+13
\| \| \| \| \| \| \| \| \| \| \| \|	Write the length of the accumulators section up-front rather than terminating it with a negative length. I find this easier to read.
* \|	Merge branch 'master' into scala-2.10	Raymond Liu	2013-11-13	2	-13/+50
\|\\|
\| *	Pass self to SparkContext._ensure_initialized.	Ewen Cheslack-Postava	2013-10-22	1	-1/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The constructor for SparkContext should pass in self so that we track the current context and produce errors if another one is created. Add a doctest to make sure creating multiple contexts triggers the exception.
\| *	Add classmethod to SparkContext to set system properties.	Ewen Cheslack-Postava	2013-10-22	1	-12/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a new classmethod to SparkContext to set system properties like is possible in Scala/Java. Unlike the Java/Scala implementations, there's no access to System until the JVM bridge is created. Since SparkContext handles that, move the initialization of the JVM connection to a separate classmethod that can safely be called repeatedly as long as the same instance (or no instance) is provided.
\| *	Add an add() method to pyspark accumulators.	Ewen Cheslack-Postava	2013-10-19	1	-1/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a regular method for adding a term to accumulators in pyspark. Currently if you have a non-global accumulator, adding to it is awkward. The += operator can't be used for non-global accumulators captured via closure because it's involves an assignment. The only way to do it is using __iadd__ directly. Adding this method lets you write code like this: def main(): sc = SparkContext() accum = sc.accumulator(0) rdd = sc.parallelize([1,2,3]) def f(x): accum.add(x) rdd.foreach(f) print accum.value where using accum += x instead would have caused UnboundLocalError exceptions in workers. Currently it would have to be written as accum.__iadd__(x).
* \|	Merge branch 'master' of github.com:apache/incubator-spark into scala-2.10	Prashant Sharma	2013-10-10	1	-7/+53
\|\\|
\| *	Fix PySpark docs and an overly long line of code after fdbae41e	Matei Zaharia	2013-10-09	1	-8/+8
\| \|
\| *	SPARK-705: implement sortByKey() in PySpark	Andre Schumacher	2013-10-07	1	-1/+47
\| \|
* \|	Merge branch 'master' into wip-merge-master	Prashant Sharma	2013-10-08	2	-4/+10
\|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: bagel/pom.xml core/pom.xml core/src/test/scala/org/apache/spark/ui/UISuite.scala examples/pom.xml mllib/pom.xml pom.xml project/SparkBuild.scala repl/pom.xml streaming/pom.xml tools/pom.xml In scala 2.10, a shorter representation is used for naming artifacts so changed to shorter scala version for artifacts and made it a property in pom.
\| *	Fixing SPARK-602: PythonPartitioner	Andre Schumacher	2013-10-04	2	-4/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently PythonPartitioner determines partition ID by hashing a byte-array representation of PySpark's key. This PR lets PythonPartitioner use the actual partition ID, which is required e.g. for sorting via PySpark.
* \|	Merge branch 'master' into scala-2.10	Prashant Sharma	2013-10-01	1	-1/+1
\|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressUI.scala docs/_config.yml project/SparkBuild.scala repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala
\| *	Update build version in master	Patrick Wendell	2013-09-24	1	-1/+1
\| \|
* \|	Merge branch 'master' of git://github.com/mesos/spark into scala-2.10	Prashant Sharma	2013-09-15	5	-1/+78
\|\\| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/SparkContext.scala project/SparkBuild.scala
\| *	Whoopsy daisy	Aaron Davidson	2013-09-08	1	-1/+0
\| \|
\| *	Export StorageLevel and refactor	Aaron Davidson	2013-09-07	5	-26/+62
\| \|
\| *	Remove reflection, hard-code StorageLevels	Aaron Davidson	2013-09-07	2	-24/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The sc.StorageLevel -> StorageLevel pathway is a bit janky, but otherwise the shell would have to call a private method of SparkContext. Having StorageLevel available in sc also doesn't seem like the end of the world. There may be a better solution, though. As for creating the StorageLevel object itself, this seems to be the best way in Python 2 for creating singleton, enum-like objects: http://stackoverflow.com/questions/36932/how-can-i-represent-an-enum-in-python
\| *	Memoize StorageLevels read from JVM	Aaron Davidson	2013-09-06	1	-2/+9
\| \|
\| *	SPARK-660: Add StorageLevel support in Python	Aaron Davidson	2013-09-05	3	-1/+34
\| \| \| \| \| \| \| \| \| \|	It uses reflection... I am not proud of that fact, but it at least ensures compatibility (sans refactoring of the StorageLevel stuff).
* \|	Merged with master	Prashant Sharma	2013-09-06	25	-98/+948
\|\\|
\| *	Add missing license headers found with RAT	Matei Zaharia	2013-09-02	1	-1/+18
\| \|
\| *	Exclude some private modules in epydoc	Matei Zaharia	2013-09-02	1	-0/+1
\| \|
\| *	Further fixes to get PySpark to work on Windows	Matei Zaharia	2013-09-02	1	-5/+12
\| \|
\| *	Allow PySpark to launch worker.py directly on Windows	Matei Zaharia	2013-09-01	1	-4/+7
\| \|
\| *	Move some classes to more appropriate packages:	Matei Zaharia	2013-09-01	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	* RDD, RDDFunctions -> org.apache.spark.rdd Utils, ClosureCleaner, SizeEstimator -> org.apache.spark.util * JavaSerializer, KryoSerializer -> org.apache.spark.serializer