Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | Merge branch 'master' of git://github.com/mesos/spark into scala-2.10 | Prashant Sharma | 2013-09-15 | 1 | -0/+19 |
|\ | | | | | | | | | | | Conflicts: core/src/main/scala/org/apache/spark/SparkContext.scala project/SparkBuild.scala | ||||
| * | Export StorageLevel and refactor | Aaron Davidson | 2013-09-07 | 1 | -1/+2 |
| | | |||||
| * | SPARK-660: Add StorageLevel support in Python | Aaron Davidson | 2013-09-05 | 1 | -0/+18 |
| | | | | | | | | | | It uses reflection... I am not proud of that fact, but it at least ensures compatibility (sans refactoring of the StorageLevel stuff). | ||||
* | | Merged with master | Prashant Sharma | 2013-09-06 | 1 | -20/+188 |
|\| | |||||
| * | Merge pull request #861 from AndreSchumacher/pyspark_sampling_function | Matei Zaharia | 2013-08-31 | 1 | -7/+55 |
| |\ | | | | | | | Pyspark sampling function | ||||
| | * | RDD sample() and takeSample() prototypes for PySpark | Andre Schumacher | 2013-08-28 | 1 | -7/+55 |
| | | | |||||
| * | | PySpark: implementing subtractByKey(), subtract() and keyBy() | Andre Schumacher | 2013-08-28 | 1 | -0/+37 |
| |/ | |||||
| * | Implementing SPARK-838: Add DoubleRDDFunctions methods to PySpark | Andre Schumacher | 2013-08-21 | 1 | -1/+59 |
| | | |||||
| * | Implementing SPARK-878 for PySpark: adding zip and egg files to context and ↵ | Andre Schumacher | 2013-08-16 | 1 | -1/+3 |
| | | | | | | | | passing it down to workers which add these to their sys.path | ||||
| * | Do not inherit master's PYTHONPATH on workers. | Josh Rosen | 2013-07-29 | 1 | -3/+2 |
| | | | | | | | | | | | | | | | | | | | | | | | | This fixes SPARK-832, an issue where PySpark would not work when the master and workers used different SPARK_HOME paths. This change may potentially break code that relied on the master's PYTHONPATH being used on workers. To have custom PYTHONPATH additions used on the workers, users should set a custom PYTHONPATH in spark-env.sh rather than setting it in the shell. | ||||
| * | Use None instead of empty string as it's slightly smaller/faster | Matei Zaharia | 2013-07-29 | 1 | -1/+1 |
| | | |||||
| * | Optimize Python foreach() to not return as many objects | Matei Zaharia | 2013-07-29 | 1 | -1/+5 |
| | | |||||
| * | Optimize Python take() to not compute entire first partition | Matei Zaharia | 2013-07-29 | 1 | -6/+9 |
| | | |||||
| * | Add Apache license headers and LICENSE and NOTICE files | Matei Zaharia | 2013-07-16 | 1 | -0/+17 |
| | | |||||
* | | PySpark: replacing class manifest by class tag for Scala 2.10.2 inside rdd.py | Andre Schumacher | 2013-08-30 | 1 | -2/+2 |
|/ | |||||
* | Fix Python saveAsTextFile doctest to not expect order to be preserved | Jey Kottalam | 2013-04-02 | 1 | -1/+1 |
| | |||||
* | Change numSplits to numPartitions in PySpark. | Josh Rosen | 2013-02-24 | 1 | -28/+28 |
| | |||||
* | Add commutative requirement for 'reduce' to Python docstring. | Mark Hamstra | 2013-02-09 | 1 | -2/+2 |
| | |||||
* | Fetch fewer objects in PySpark's take() method. | Josh Rosen | 2013-02-03 | 1 | -0/+4 |
| | |||||
* | Fix reporting of PySpark doctest failures. | Josh Rosen | 2013-02-03 | 1 | -1/+3 |
| | |||||
* | Use spark.local.dir for PySpark temp files (SPARK-580). | Josh Rosen | 2013-02-01 | 1 | -6/+1 |
| | |||||
* | Do not launch JavaGateways on workers (SPARK-674). | Josh Rosen | 2013-02-01 | 1 | -6/+6 |
| | | | | | | | | | | | The problem was that the gateway was being initialized whenever the pyspark.context module was loaded. The fix uses lazy initialization that occurs only when SparkContext instances are actually constructed. I also made the gateway and jvm variables private. This change results in ~3-4x performance improvement when running the PySpark unit tests. | ||||
* | Merge pull request #389 from JoshRosen/python_rdd_checkpointing | Matei Zaharia | 2013-01-20 | 1 | -1/+34 |
|\ | | | | | Add checkpointing to the Python API | ||||
| * | Clean up setup code in PySpark checkpointing tests | Josh Rosen | 2013-01-20 | 1 | -2/+1 |
| | | |||||
| * | Update checkpointing API docs in Python/Java. | Josh Rosen | 2013-01-20 | 1 | -12/+5 |
| | | |||||
| * | Add checkpointFile() and more tests to PySpark. | Josh Rosen | 2013-01-20 | 1 | -1/+8 |
| | | |||||
| * | Add RDD checkpointing to Python API. | Josh Rosen | 2013-01-20 | 1 | -0/+34 |
| | | |||||
* | | Fix PythonPartitioner equality; see SPARK-654. | Josh Rosen | 2013-01-20 | 1 | -6/+11 |
|/ | | | | | | PythonPartitioner did not take the Python-side partitioning function into account when checking for equality, which might cause problems in the future. | ||||
* | Added accumulators to PySpark | Matei Zaharia | 2013-01-20 | 1 | -1/+1 |
| | |||||
* | Add mapPartitionsWithSplit() to PySpark. | Josh Rosen | 2013-01-08 | 1 | -11/+22 |
| | |||||
* | Change PySpark RDD.take() to not call iterator(). | Josh Rosen | 2013-01-03 | 1 | -6/+5 |
| | |||||
* | Rename top-level 'pyspark' directory to 'python' | Josh Rosen | 2013-01-01 | 1 | -0/+713 |