spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	SPARK-1115: Catch depickling errors	Bouke van der Bijl	2014-02-26	1	-24/+24
\| \| \| \| \| \| \| \| \| \| \| \| \|	This surroungs the complete worker code in a try/except block so we catch any error that arrives. An example would be the depickling failing for some reason @JoshRosen Author: Bouke van der Bijl <boukevanderbijl@gmail.com> Closes #644 from bouk/catch-depickling-errors and squashes the following commits: f0f67cc [Bouke van der Bijl] Lol indentation 0e4d504 [Bouke van der Bijl] Surround the complete python worker with the try block
*	Fixed minor typo in worker.py	jyotiska	2014-02-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Fixed minor typo in worker.py Author: jyotiska <jyotiska123@gmail.com> Closes #630 from jyotiska/pyspark_code and squashes the following commits: ee44201 [jyotiska] typo fixed in worker.py
*	Switch from MUTF8 to UTF8 in PySpark serializers.	Josh Rosen	2014-01-28	1	-4/+4
\| \| \| \| \| \| \| \| \|	This fixes SPARK-1043, a bug introduced in 0.9.0 where PySpark couldn't serialize strings > 64kB. This fix was written by @tyro89 and @bouk in #512. This commit squashes and rebases their pull request in order to fix some merge conflicts.
*	Log Python exceptions to stderr as well	Matei Zaharia	2014-01-12	1	-0/+4
\| \| \| \| \| \|	This helps in case the exception happened while serializing a record to be sent to Java, leaving the stream to Java in an inconsistent state where PythonRDD won't be able to read the error.
*	FramedSerializer: _dumps => dumps, _loads => loads.	Josh Rosen	2013-11-10	1	-2/+2
\|
*	Send PySpark commands as bytes insetad of strings.	Josh Rosen	2013-11-10	1	-10/+2
\|
*	Add custom serializer support to PySpark.	Josh Rosen	2013-11-10	1	-22/+19
\| \| \| \| \| \| \| \| \|	For now, this only adds MarshalSerializer, but it lays the groundwork for other supporting custom serializers. Many of these mechanisms can also be used to support deserialization of different data formats sent by Java, such as data encoded by MsgPack. This also fixes a bug in SparkContext.union().
*	Remove Pickle-wrapping of Java objects in PySpark.	Josh Rosen	2013-11-03	1	-5/+9
\| \| \| \| \| \|	If we support custom serializers, the Python worker will know what type of input to expect, so we won't need to wrap Tuple2 and Strings into pickled tuples and strings.
*	Replace magic lengths with constants in PySpark.	Josh Rosen	2013-11-03	1	-6/+7
\| \| \| \| \| \|	Write the length of the accumulators section up-front rather than terminating it with a negative length. I find this easier to read.
*	Allow PySpark to launch worker.py directly on Windows	Matei Zaharia	2013-09-01	1	-4/+7
\|
*	Implementing SPARK-878 for PySpark: adding zip and egg files to context and ↵	Andre Schumacher	2013-08-16	1	-1/+12
\| \| \| \|	passing it down to workers which add these to their sys.path
*	Add Apache license headers and LICENSE and NOTICE files	Matei Zaharia	2013-07-16	1	-0/+17
\|
*	Fix reporting of PySpark exceptions	Jey Kottalam	2013-06-21	1	-1/+1
\|
*	Add tests and fixes for Python daemon shutdown	Jey Kottalam	2013-06-21	1	-0/+2
\|
*	Prefork Python worker processes	Jey Kottalam	2013-06-21	1	-32/+29
\|
*	Add Python timing instrumentation	Jey Kottalam	2013-06-21	1	-1/+15
\|
*	Fix stdout redirection in PySpark.	Josh Rosen	2013-02-01	1	-2/+3
\|
*	SPARK-673: Capture and re-throw Python exceptions	Patrick Wendell	2013-01-31	1	-2/+8
\| \| \| \| \|	This patch alters the Python <-> executor protocol to pass on exception data when they occur in user Python code.
*	Allow PySpark's SparkFiles to be used from driver	Josh Rosen	2013-01-23	1	-0/+1
\| \| \| \|	Fix minor documentation formatting issues.
*	Fix sys.path bug in PySpark SparkContext.addPyFile	Josh Rosen	2013-01-22	1	-0/+1
\|
*	Don't download files to master's working directory.	Josh Rosen	2013-01-21	1	-0/+3
\| \| \| \| \| \| \|	This should avoid exceptions caused by existing files with different contents. I also removed some unused code.
*	Added accumulators to PySpark	Matei Zaharia	2013-01-20	1	-1/+6
\|
*	Add mapPartitionsWithSplit() to PySpark.	Josh Rosen	2013-01-08	1	-1/+3
\|
*	Rename top-level 'pyspark' directory to 'python'	Josh Rosen	2013-01-01	1	-0/+40