aboutsummaryrefslogtreecommitdiff
path: root/python/pyspark/worker.py
Commit message (Collapse)AuthorAgeFilesLines
* SPARK-1115: Catch depickling errorsBouke van der Bijl2014-02-261-24/+24
| | | | | | | | | | | | | This surroungs the complete worker code in a try/except block so we catch any error that arrives. An example would be the depickling failing for some reason @JoshRosen Author: Bouke van der Bijl <boukevanderbijl@gmail.com> Closes #644 from bouk/catch-depickling-errors and squashes the following commits: f0f67cc [Bouke van der Bijl] Lol indentation 0e4d504 [Bouke van der Bijl] Surround the complete python worker with the try block
* Fixed minor typo in worker.pyjyotiska2014-02-221-1/+1
| | | | | | | | | | Fixed minor typo in worker.py Author: jyotiska <jyotiska123@gmail.com> Closes #630 from jyotiska/pyspark_code and squashes the following commits: ee44201 [jyotiska] typo fixed in worker.py
* Switch from MUTF8 to UTF8 in PySpark serializers.Josh Rosen2014-01-281-4/+4
| | | | | | | | | This fixes SPARK-1043, a bug introduced in 0.9.0 where PySpark couldn't serialize strings > 64kB. This fix was written by @tyro89 and @bouk in #512. This commit squashes and rebases their pull request in order to fix some merge conflicts.
* Log Python exceptions to stderr as wellMatei Zaharia2014-01-121-0/+4
| | | | | | This helps in case the exception happened while serializing a record to be sent to Java, leaving the stream to Java in an inconsistent state where PythonRDD won't be able to read the error.
* FramedSerializer: _dumps => dumps, _loads => loads.Josh Rosen2013-11-101-2/+2
|
* Send PySpark commands as bytes insetad of strings.Josh Rosen2013-11-101-10/+2
|
* Add custom serializer support to PySpark.Josh Rosen2013-11-101-22/+19
| | | | | | | | | For now, this only adds MarshalSerializer, but it lays the groundwork for other supporting custom serializers. Many of these mechanisms can also be used to support deserialization of different data formats sent by Java, such as data encoded by MsgPack. This also fixes a bug in SparkContext.union().
* Remove Pickle-wrapping of Java objects in PySpark.Josh Rosen2013-11-031-5/+9
| | | | | | If we support custom serializers, the Python worker will know what type of input to expect, so we won't need to wrap Tuple2 and Strings into pickled tuples and strings.
* Replace magic lengths with constants in PySpark.Josh Rosen2013-11-031-6/+7
| | | | | | Write the length of the accumulators section up-front rather than terminating it with a negative length. I find this easier to read.
* Allow PySpark to launch worker.py directly on WindowsMatei Zaharia2013-09-011-4/+7
|
* Implementing SPARK-878 for PySpark: adding zip and egg files to context and ↵Andre Schumacher2013-08-161-1/+12
| | | | passing it down to workers which add these to their sys.path
* Add Apache license headers and LICENSE and NOTICE filesMatei Zaharia2013-07-161-0/+17
|
* Fix reporting of PySpark exceptionsJey Kottalam2013-06-211-1/+1
|
* Add tests and fixes for Python daemon shutdownJey Kottalam2013-06-211-0/+2
|
* Prefork Python worker processesJey Kottalam2013-06-211-32/+29
|
* Add Python timing instrumentationJey Kottalam2013-06-211-1/+15
|
* Fix stdout redirection in PySpark.Josh Rosen2013-02-011-2/+3
|
* SPARK-673: Capture and re-throw Python exceptionsPatrick Wendell2013-01-311-2/+8
| | | | | This patch alters the Python <-> executor protocol to pass on exception data when they occur in user Python code.
* Allow PySpark's SparkFiles to be used from driverJosh Rosen2013-01-231-0/+1
| | | | Fix minor documentation formatting issues.
* Fix sys.path bug in PySpark SparkContext.addPyFileJosh Rosen2013-01-221-0/+1
|
* Don't download files to master's working directory.Josh Rosen2013-01-211-0/+3
| | | | | | | This should avoid exceptions caused by existing files with different contents. I also removed some unused code.
* Added accumulators to PySparkMatei Zaharia2013-01-201-1/+6
|
* Add mapPartitionsWithSplit() to PySpark.Josh Rosen2013-01-081-1/+3
|
* Rename top-level 'pyspark' directory to 'python'Josh Rosen2013-01-011-0/+40