spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	Removed unused basestring case from dump_stream.	Josh Rosen	2013-11-26	1	-2/+0
\|
*	FramedSerializer: _dumps => dumps, _loads => loads.	Josh Rosen	2013-11-10	1	-13/+13
\|
*	Send PySpark commands as bytes insetad of strings.	Josh Rosen	2013-11-10	1	-0/+5
\|
*	Add custom serializer support to PySpark.	Josh Rosen	2013-11-10	1	-67/+243
\| \| \| \| \| \| \| \| \|	For now, this only adds MarshalSerializer, but it lays the groundwork for other supporting custom serializers. Many of these mechanisms can also be used to support deserialization of different data formats sent by Java, such as data encoded by MsgPack. This also fixes a bug in SparkContext.union().
*	Remove Pickle-wrapping of Java objects in PySpark.	Josh Rosen	2013-11-03	1	-0/+18
\| \| \| \| \| \|	If we support custom serializers, the Python worker will know what type of input to expect, so we won't need to wrap Tuple2 and Strings into pickled tuples and strings.
*	Replace magic lengths with constants in PySpark.	Josh Rosen	2013-11-03	1	-0/+6
\| \| \| \| \| \|	Write the length of the accumulators section up-front rather than terminating it with a negative length. I find this easier to read.
*	Fixing SPARK-602: PythonPartitioner	Andre Schumacher	2013-10-04	1	-0/+4
\| \| \| \| \| \| \|	Currently PythonPartitioner determines partition ID by hashing a byte-array representation of PySpark's key. This PR lets PythonPartitioner use the actual partition ID, which is required e.g. for sorting via PySpark.
*	Add Apache license headers and LICENSE and NOTICE files	Matei Zaharia	2013-07-16	1	-0/+17
\|
*	Add Python timing instrumentation	Jey Kottalam	2013-06-21	1	-0/+4
\|
*	Added accumulators to PySpark	Matei Zaharia	2013-01-20	1	-1/+6
\|
*	Rename top-level 'pyspark' directory to 'python'	Josh Rosen	2013-01-01	1	-0/+78