aboutsummaryrefslogtreecommitdiff
path: root/python/pyspark/serializers.py
Commit message (Collapse)AuthorAgeFilesLines
* Removed unused basestring case from dump_stream.Josh Rosen2013-11-261-2/+0
|
* FramedSerializer: _dumps => dumps, _loads => loads.Josh Rosen2013-11-101-13/+13
|
* Send PySpark commands as bytes insetad of strings.Josh Rosen2013-11-101-0/+5
|
* Add custom serializer support to PySpark.Josh Rosen2013-11-101-67/+243
| | | | | | | | | For now, this only adds MarshalSerializer, but it lays the groundwork for other supporting custom serializers. Many of these mechanisms can also be used to support deserialization of different data formats sent by Java, such as data encoded by MsgPack. This also fixes a bug in SparkContext.union().
* Remove Pickle-wrapping of Java objects in PySpark.Josh Rosen2013-11-031-0/+18
| | | | | | If we support custom serializers, the Python worker will know what type of input to expect, so we won't need to wrap Tuple2 and Strings into pickled tuples and strings.
* Replace magic lengths with constants in PySpark.Josh Rosen2013-11-031-0/+6
| | | | | | Write the length of the accumulators section up-front rather than terminating it with a negative length. I find this easier to read.
* Fixing SPARK-602: PythonPartitionerAndre Schumacher2013-10-041-0/+4
| | | | | | | Currently PythonPartitioner determines partition ID by hashing a byte-array representation of PySpark's key. This PR lets PythonPartitioner use the actual partition ID, which is required e.g. for sorting via PySpark.
* Add Apache license headers and LICENSE and NOTICE filesMatei Zaharia2013-07-161-0/+17
|
* Add Python timing instrumentationJey Kottalam2013-06-211-0/+4
|
* Added accumulators to PySparkMatei Zaharia2013-01-201-1/+6
|
* Rename top-level 'pyspark' directory to 'python'Josh Rosen2013-01-011-0/+78