aboutsummaryrefslogtreecommitdiff
path: root/python/pyspark/rdd.py
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'master' of git://github.com/mesos/spark into scala-2.10Prashant Sharma2013-09-151-0/+19
|\ | | | | | | | | | | Conflicts: core/src/main/scala/org/apache/spark/SparkContext.scala project/SparkBuild.scala
| * Export StorageLevel and refactorAaron Davidson2013-09-071-1/+2
| |
| * SPARK-660: Add StorageLevel support in PythonAaron Davidson2013-09-051-0/+18
| | | | | | | | | | It uses reflection... I am not proud of that fact, but it at least ensures compatibility (sans refactoring of the StorageLevel stuff).
* | Merged with masterPrashant Sharma2013-09-061-20/+188
|\|
| * Merge pull request #861 from AndreSchumacher/pyspark_sampling_functionMatei Zaharia2013-08-311-7/+55
| |\ | | | | | | Pyspark sampling function
| | * RDD sample() and takeSample() prototypes for PySparkAndre Schumacher2013-08-281-7/+55
| | |
| * | PySpark: implementing subtractByKey(), subtract() and keyBy()Andre Schumacher2013-08-281-0/+37
| |/
| * Implementing SPARK-838: Add DoubleRDDFunctions methods to PySparkAndre Schumacher2013-08-211-1/+59
| |
| * Implementing SPARK-878 for PySpark: adding zip and egg files to context and ↵Andre Schumacher2013-08-161-1/+3
| | | | | | | | passing it down to workers which add these to their sys.path
| * Do not inherit master's PYTHONPATH on workers.Josh Rosen2013-07-291-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | This fixes SPARK-832, an issue where PySpark would not work when the master and workers used different SPARK_HOME paths. This change may potentially break code that relied on the master's PYTHONPATH being used on workers. To have custom PYTHONPATH additions used on the workers, users should set a custom PYTHONPATH in spark-env.sh rather than setting it in the shell.
| * Use None instead of empty string as it's slightly smaller/fasterMatei Zaharia2013-07-291-1/+1
| |
| * Optimize Python foreach() to not return as many objectsMatei Zaharia2013-07-291-1/+5
| |
| * Optimize Python take() to not compute entire first partitionMatei Zaharia2013-07-291-6/+9
| |
| * Add Apache license headers and LICENSE and NOTICE filesMatei Zaharia2013-07-161-0/+17
| |
* | PySpark: replacing class manifest by class tag for Scala 2.10.2 inside rdd.pyAndre Schumacher2013-08-301-2/+2
|/
* Fix Python saveAsTextFile doctest to not expect order to be preservedJey Kottalam2013-04-021-1/+1
|
* Change numSplits to numPartitions in PySpark.Josh Rosen2013-02-241-28/+28
|
* Add commutative requirement for 'reduce' to Python docstring.Mark Hamstra2013-02-091-2/+2
|
* Fetch fewer objects in PySpark's take() method.Josh Rosen2013-02-031-0/+4
|
* Fix reporting of PySpark doctest failures.Josh Rosen2013-02-031-1/+3
|
* Use spark.local.dir for PySpark temp files (SPARK-580).Josh Rosen2013-02-011-6/+1
|
* Do not launch JavaGateways on workers (SPARK-674).Josh Rosen2013-02-011-6/+6
| | | | | | | | | | | The problem was that the gateway was being initialized whenever the pyspark.context module was loaded. The fix uses lazy initialization that occurs only when SparkContext instances are actually constructed. I also made the gateway and jvm variables private. This change results in ~3-4x performance improvement when running the PySpark unit tests.
* Merge pull request #389 from JoshRosen/python_rdd_checkpointingMatei Zaharia2013-01-201-1/+34
|\ | | | | Add checkpointing to the Python API
| * Clean up setup code in PySpark checkpointing testsJosh Rosen2013-01-201-2/+1
| |
| * Update checkpointing API docs in Python/Java.Josh Rosen2013-01-201-12/+5
| |
| * Add checkpointFile() and more tests to PySpark.Josh Rosen2013-01-201-1/+8
| |
| * Add RDD checkpointing to Python API.Josh Rosen2013-01-201-0/+34
| |
* | Fix PythonPartitioner equality; see SPARK-654.Josh Rosen2013-01-201-6/+11
|/ | | | | | PythonPartitioner did not take the Python-side partitioning function into account when checking for equality, which might cause problems in the future.
* Added accumulators to PySparkMatei Zaharia2013-01-201-1/+1
|
* Add mapPartitionsWithSplit() to PySpark.Josh Rosen2013-01-081-11/+22
|
* Change PySpark RDD.take() to not call iterator().Josh Rosen2013-01-031-6/+5
|
* Rename top-level 'pyspark' directory to 'python'Josh Rosen2013-01-011-0/+713