Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | Do not inherit master's PYTHONPATH on workers. | Josh Rosen | 2013-07-29 | 1 | -3/+2 |
| | | | | | | | | | | | | This fixes SPARK-832, an issue where PySpark would not work when the master and workers used different SPARK_HOME paths. This change may potentially break code that relied on the master's PYTHONPATH being used on workers. To have custom PYTHONPATH additions used on the workers, users should set a custom PYTHONPATH in spark-env.sh rather than setting it in the shell. | ||||
* | SPARK-815. Python parallelize() should split lists before batching | Matei Zaharia | 2013-07-29 | 1 | -2/+9 |
| | | | | | | | | | One unfortunate consequence of this fix is that we materialize any collections that are given to us as generators, but this seems necessary to get reasonable behavior on small collections. We could add a batchSize parameter later to bypass auto-computation of batch size if this becomes a problem (e.g. if users really want to parallelize big generators nicely) | ||||
* | Use None instead of empty string as it's slightly smaller/faster | Matei Zaharia | 2013-07-29 | 1 | -1/+1 |
| | |||||
* | Optimize Python foreach() to not return as many objects | Matei Zaharia | 2013-07-29 | 1 | -1/+5 |
| | |||||
* | Optimize Python take() to not compute entire first partition | Matei Zaharia | 2013-07-29 | 1 | -6/+9 |
| | |||||
* | Add Apache license headers and LICENSE and NOTICE files | Matei Zaharia | 2013-07-16 | 11 | -0/+187 |
| | |||||
* | Fixed PySpark perf regression by not using socket.makefile(), and improved | root | 2013-07-01 | 1 | -18/+24 |
| | | | | | | | debuggability by letting "print" statements show up in the executor's stderr Conflicts: core/src/main/scala/spark/api/python/PythonRDD.scala | ||||
* | Fix reporting of PySpark exceptions | Jey Kottalam | 2013-06-21 | 2 | -5/+19 |
| | |||||
* | PySpark daemon: fix deadlock, improve error handling | Jey Kottalam | 2013-06-21 | 1 | -17/+50 |
| | |||||
* | Add tests and fixes for Python daemon shutdown | Jey Kottalam | 2013-06-21 | 3 | -22/+69 |
| | |||||
* | Prefork Python worker processes | Jey Kottalam | 2013-06-21 | 2 | -32/+138 |
| | |||||
* | Add Python timing instrumentation | Jey Kottalam | 2013-06-21 | 2 | -1/+19 |
| | |||||
* | Fix Python saveAsTextFile doctest to not expect order to be preserved | Jey Kottalam | 2013-04-02 | 1 | -1/+1 |
| | |||||
* | Change numSplits to numPartitions in PySpark. | Josh Rosen | 2013-02-24 | 2 | -38/+38 |
| | |||||
* | Add commutative requirement for 'reduce' to Python docstring. | Mark Hamstra | 2013-02-09 | 1 | -2/+2 |
| | |||||
* | Remove unnecessary doctest __main__ methods. | Josh Rosen | 2013-02-03 | 2 | -18/+0 |
| | |||||
* | Fetch fewer objects in PySpark's take() method. | Josh Rosen | 2013-02-03 | 1 | -0/+4 |
| | |||||
* | Fix reporting of PySpark doctest failures. | Josh Rosen | 2013-02-03 | 2 | -2/+6 |
| | |||||
* | Use spark.local.dir for PySpark temp files (SPARK-580). | Josh Rosen | 2013-02-01 | 2 | -10/+9 |
| | |||||
* | Do not launch JavaGateways on workers (SPARK-674). | Josh Rosen | 2013-02-01 | 4 | -18/+25 |
| | | | | | | | | | | | The problem was that the gateway was being initialized whenever the pyspark.context module was loaded. The fix uses lazy initialization that occurs only when SparkContext instances are actually constructed. I also made the gateway and jvm variables private. This change results in ~3-4x performance improvement when running the PySpark unit tests. | ||||
* | Fix stdout redirection in PySpark. | Josh Rosen | 2013-02-01 | 2 | -2/+12 |
| | |||||
* | SPARK-673: Capture and re-throw Python exceptions | Patrick Wendell | 2013-01-31 | 1 | -2/+8 |
| | | | | | This patch alters the Python <-> executor protocol to pass on exception data when they occur in user Python code. | ||||
* | Merge pull request #430 from pwendell/pyspark-guide | Matei Zaharia | 2013-01-30 | 1 | -0/+1 |
|\ | | | | | Minor improvements to PySpark docs | ||||
| * | Make module help available in python shell. | Patrick Wendell | 2013-01-30 | 1 | -0/+1 |
| | | | | | | | | Also, adds a line in doc explaining how to use. | ||||
* | | Replace old 'master' term with 'driver'. | Stephen Haberman | 2013-01-25 | 1 | -1/+1 |
| | | |||||
* | | Merge pull request #396 from JoshRosen/spark-653 | Matei Zaharia | 2013-01-24 | 2 | -14/+29 |
|\ \ | | | | | | | Make PySpark AccumulatorParam an abstract base class | ||||
| * | | Remove use of abc.ABCMeta due to cloudpickle issue. | Josh Rosen | 2013-01-23 | 1 | -7/+4 |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | cloudpickle runs into issues while pickling subclasses of AccumulatorParam, which may be related to this Python issue: http://bugs.python.org/issue7689 This seems hard to fix and the ABCMeta wasn't necessary, so I removed it. | ||||
| * | | Make AccumulatorParam an abstract base class. | Josh Rosen | 2013-01-21 | 2 | -13/+31 |
| | | | |||||
* | | | Allow PySpark's SparkFiles to be used from driver | Josh Rosen | 2013-01-23 | 4 | -9/+62 |
| | | | | | | | | | | | | Fix minor documentation formatting issues. | ||||
* | | | Fix sys.path bug in PySpark SparkContext.addPyFile | Josh Rosen | 2013-01-22 | 3 | -7/+34 |
| | | | |||||
* | | | Don't download files to master's working directory. | Josh Rosen | 2013-01-21 | 4 | -5/+67 |
|/ / | | | | | | | | | | | | | This should avoid exceptions caused by existing files with different contents. I also removed some unused code. | ||||
* | | Merge pull request #389 from JoshRosen/python_rdd_checkpointing | Matei Zaharia | 2013-01-20 | 3 | -2/+112 |
|\ \ | | | | | | | Add checkpointing to the Python API | ||||
| * | | Clean up setup code in PySpark checkpointing tests | Josh Rosen | 2013-01-20 | 2 | -16/+6 |
| | | | |||||
| * | | Update checkpointing API docs in Python/Java. | Josh Rosen | 2013-01-20 | 2 | -16/+12 |
| | | | |||||
| * | | Add checkpointFile() and more tests to PySpark. | Josh Rosen | 2013-01-20 | 3 | -2/+37 |
| | | | |||||
| * | | Add RDD checkpointing to Python API. | Josh Rosen | 2013-01-20 | 3 | -0/+89 |
| | | | |||||
* | | | Fix PythonPartitioner equality; see SPARK-654. | Josh Rosen | 2013-01-20 | 1 | -6/+11 |
|/ / | | | | | | | | | | | PythonPartitioner did not take the Python-side partitioning function into account when checking for equality, which might cause problems in the future. | ||||
* / | Add __repr__ to Accumulator; fix bug in sc.accumulator | Josh Rosen | 2013-01-20 | 1 | -1/+10 |
|/ | |||||
* | Add a class comment to Accumulator | Matei Zaharia | 2013-01-20 | 1 | -0/+12 |
| | |||||
* | Added accumulators to PySpark | Matei Zaharia | 2013-01-20 | 7 | -5/+223 |
| | |||||
* | Change PYSPARK_PYTHON_EXEC to PYSPARK_PYTHON. | Josh Rosen | 2013-01-10 | 1 | -1/+1 |
| | |||||
* | Add mapPartitionsWithSplit() to PySpark. | Josh Rosen | 2013-01-08 | 2 | -12/+25 |
| | |||||
* | Change PySpark RDD.take() to not call iterator(). | Josh Rosen | 2013-01-03 | 2 | -6/+6 |
| | |||||
* | Add `pyspark` script to replace the other scripts. | Josh Rosen | 2013-01-01 | 1 | -26/+10 |
| | | | Expand the PySpark programming guide. | ||||
* | Rename top-level 'pyspark' directory to 'python' | Josh Rosen | 2013-01-01 | 10 | -0/+2194 |