aboutsummaryrefslogtreecommitdiff
path: root/python/pyspark
Commit message (Collapse)AuthorAgeFilesLines
* Optimize Python foreach() to not return as many objectsMatei Zaharia2013-07-291-1/+5
|
* Optimize Python take() to not compute entire first partitionMatei Zaharia2013-07-291-6/+9
|
* Add Apache license headers and LICENSE and NOTICE filesMatei Zaharia2013-07-1611-0/+187
|
* Fixed PySpark perf regression by not using socket.makefile(), and improvedroot2013-07-011-18/+24
| | | | | | | debuggability by letting "print" statements show up in the executor's stderr Conflicts: core/src/main/scala/spark/api/python/PythonRDD.scala
* Fix reporting of PySpark exceptionsJey Kottalam2013-06-212-5/+19
|
* PySpark daemon: fix deadlock, improve error handlingJey Kottalam2013-06-211-17/+50
|
* Add tests and fixes for Python daemon shutdownJey Kottalam2013-06-213-22/+69
|
* Prefork Python worker processesJey Kottalam2013-06-212-32/+138
|
* Add Python timing instrumentationJey Kottalam2013-06-212-1/+19
|
* Fix Python saveAsTextFile doctest to not expect order to be preservedJey Kottalam2013-04-021-1/+1
|
* Change numSplits to numPartitions in PySpark.Josh Rosen2013-02-242-38/+38
|
* Add commutative requirement for 'reduce' to Python docstring.Mark Hamstra2013-02-091-2/+2
|
* Remove unnecessary doctest __main__ methods.Josh Rosen2013-02-032-18/+0
|
* Fetch fewer objects in PySpark's take() method.Josh Rosen2013-02-031-0/+4
|
* Fix reporting of PySpark doctest failures.Josh Rosen2013-02-032-2/+6
|
* Use spark.local.dir for PySpark temp files (SPARK-580).Josh Rosen2013-02-012-10/+9
|
* Do not launch JavaGateways on workers (SPARK-674).Josh Rosen2013-02-014-18/+25
| | | | | | | | | | | The problem was that the gateway was being initialized whenever the pyspark.context module was loaded. The fix uses lazy initialization that occurs only when SparkContext instances are actually constructed. I also made the gateway and jvm variables private. This change results in ~3-4x performance improvement when running the PySpark unit tests.
* Fix stdout redirection in PySpark.Josh Rosen2013-02-012-2/+12
|
* SPARK-673: Capture and re-throw Python exceptionsPatrick Wendell2013-01-311-2/+8
| | | | | This patch alters the Python <-> executor protocol to pass on exception data when they occur in user Python code.
* Merge pull request #430 from pwendell/pyspark-guideMatei Zaharia2013-01-301-0/+1
|\ | | | | Minor improvements to PySpark docs
| * Make module help available in python shell.Patrick Wendell2013-01-301-0/+1
| | | | | | | | Also, adds a line in doc explaining how to use.
* | Replace old 'master' term with 'driver'.Stephen Haberman2013-01-251-1/+1
| |
* | Merge pull request #396 from JoshRosen/spark-653Matei Zaharia2013-01-242-14/+29
|\ \ | | | | | | Make PySpark AccumulatorParam an abstract base class
| * | Remove use of abc.ABCMeta due to cloudpickle issue.Josh Rosen2013-01-231-7/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | cloudpickle runs into issues while pickling subclasses of AccumulatorParam, which may be related to this Python issue: http://bugs.python.org/issue7689 This seems hard to fix and the ABCMeta wasn't necessary, so I removed it.
| * | Make AccumulatorParam an abstract base class.Josh Rosen2013-01-212-13/+31
| | |
* | | Allow PySpark's SparkFiles to be used from driverJosh Rosen2013-01-234-9/+62
| | | | | | | | | | | | Fix minor documentation formatting issues.
* | | Fix sys.path bug in PySpark SparkContext.addPyFileJosh Rosen2013-01-223-7/+34
| | |
* | | Don't download files to master's working directory.Josh Rosen2013-01-214-5/+67
|/ / | | | | | | | | | | | | This should avoid exceptions caused by existing files with different contents. I also removed some unused code.
* | Merge pull request #389 from JoshRosen/python_rdd_checkpointingMatei Zaharia2013-01-203-2/+112
|\ \ | | | | | | Add checkpointing to the Python API
| * | Clean up setup code in PySpark checkpointing testsJosh Rosen2013-01-202-16/+6
| | |
| * | Update checkpointing API docs in Python/Java.Josh Rosen2013-01-202-16/+12
| | |
| * | Add checkpointFile() and more tests to PySpark.Josh Rosen2013-01-203-2/+37
| | |
| * | Add RDD checkpointing to Python API.Josh Rosen2013-01-203-0/+89
| | |
* | | Fix PythonPartitioner equality; see SPARK-654.Josh Rosen2013-01-201-6/+11
|/ / | | | | | | | | | | PythonPartitioner did not take the Python-side partitioning function into account when checking for equality, which might cause problems in the future.
* / Add __repr__ to Accumulator; fix bug in sc.accumulatorJosh Rosen2013-01-201-1/+10
|/
* Add a class comment to AccumulatorMatei Zaharia2013-01-201-0/+12
|
* Added accumulators to PySparkMatei Zaharia2013-01-207-5/+223
|
* Change PYSPARK_PYTHON_EXEC to PYSPARK_PYTHON.Josh Rosen2013-01-101-1/+1
|
* Add mapPartitionsWithSplit() to PySpark.Josh Rosen2013-01-082-12/+25
|
* Change PySpark RDD.take() to not call iterator().Josh Rosen2013-01-032-6/+6
|
* Add `pyspark` script to replace the other scripts.Josh Rosen2013-01-011-26/+10
| | | Expand the PySpark programming guide.
* Rename top-level 'pyspark' directory to 'python'Josh Rosen2013-01-0110-0/+2194