Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | Make Python function/line appear in the UI. | Tor Myklebust | 2013-12-28 | 1 | -11/+55 |
| | |||||
* | Merge pull request #276 from shivaram/collectPartition | Reynold Xin | 2013-12-19 | 1 | -1/+6 |
|\ | | | | | | | | | | | | | | | Add collectPartition to JavaRDD interface. This interface is useful for implementing `take` from other language frontends where the data is serialized. Also remove `takePartition` from PythonRDD and use `collectPartition` in rdd.py. Thanks @concretevitamin for the original change and tests. | ||||
| * | Make collectPartitions take an array of partitions | Shivaram Venkataraman | 2013-12-19 | 1 | -1/+6 |
| | | | | | | | | | | | | Change the implementation to use runJob instead of PartitionPruningRDD. Also update the unit tests and the python take implementation to use the new interface. | ||||
| * | Add collectPartition to JavaRDD interface. | Shivaram Venkataraman | 2013-12-18 | 1 | -1/+1 |
| | | | | | | | | Also remove takePartition from PythonRDD and use collectPartition in rdd.py. | ||||
* | | Add toString to Java RDD, and __repr__ to Python RDD | Nick Pentreath | 2013-12-19 | 1 | -0/+3 |
|/ | |||||
* | Merge branch 'master' into akka-bug-fix | Prashant Sharma | 2013-12-11 | 1 | -1/+4 |
|\ | | | | | | | | | | | | | | | | | | | Conflicts: core/pom.xml core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala pom.xml project/SparkBuild.scala streaming/pom.xml yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala | ||||
| * | Fix UnicodeEncodeError in PySpark saveAsTextFile(). | Josh Rosen | 2013-11-28 | 1 | -1/+4 |
| | | | | | | Fixes SPARK-970. | ||||
* | | Merge branch 'master' into wip-scala-2.10 | Prashant Sharma | 2013-11-27 | 1 | -43/+54 |
|\| | | | | | | | | | | | | | | | | | Conflicts: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala core/src/main/scala/org/apache/spark/rdd/MapPartitionsRDD.scala core/src/main/scala/org/apache/spark/rdd/MapPartitionsWithContextRDD.scala core/src/main/scala/org/apache/spark/rdd/RDD.scala python/pyspark/rdd.py | ||||
| * | FramedSerializer: _dumps => dumps, _loads => loads. | Josh Rosen | 2013-11-10 | 1 | -2/+2 |
| | | |||||
| * | Send PySpark commands as bytes insetad of strings. | Josh Rosen | 2013-11-10 | 1 | -6/+6 |
| | | |||||
| * | Add custom serializer support to PySpark. | Josh Rosen | 2013-11-10 | 1 | -39/+47 |
| | | | | | | | | | | | | | | | | | | For now, this only adds MarshalSerializer, but it lays the groundwork for other supporting custom serializers. Many of these mechanisms can also be used to support deserialization of different data formats sent by Java, such as data encoded by MsgPack. This also fixes a bug in SparkContext.union(). | ||||
| * | Remove Pickle-wrapping of Java objects in PySpark. | Josh Rosen | 2013-11-03 | 1 | -4/+7 |
| | | | | | | | | | | | | If we support custom serializers, the Python worker will know what type of input to expect, so we won't need to wrap Tuple2 and Strings into pickled tuples and strings. | ||||
* | | Merge branch 'master' of github.com:apache/incubator-spark into scala-2.10 | Prashant Sharma | 2013-10-10 | 1 | -7/+53 |
|\| | |||||
| * | Fix PySpark docs and an overly long line of code after fdbae41e | Matei Zaharia | 2013-10-09 | 1 | -8/+8 |
| | | |||||
| * | SPARK-705: implement sortByKey() in PySpark | Andre Schumacher | 2013-10-07 | 1 | -1/+47 |
| | | |||||
* | | Merge branch 'master' into wip-merge-master | Prashant Sharma | 2013-10-08 | 1 | -4/+6 |
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: bagel/pom.xml core/pom.xml core/src/test/scala/org/apache/spark/ui/UISuite.scala examples/pom.xml mllib/pom.xml pom.xml project/SparkBuild.scala repl/pom.xml streaming/pom.xml tools/pom.xml In scala 2.10, a shorter representation is used for naming artifacts so changed to shorter scala version for artifacts and made it a property in pom. | ||||
| * | Fixing SPARK-602: PythonPartitioner | Andre Schumacher | 2013-10-04 | 1 | -4/+6 |
| | | | | | | | | | | | | | | Currently PythonPartitioner determines partition ID by hashing a byte-array representation of PySpark's key. This PR lets PythonPartitioner use the actual partition ID, which is required e.g. for sorting via PySpark. | ||||
* | | Merge branch 'master' of git://github.com/mesos/spark into scala-2.10 | Prashant Sharma | 2013-09-15 | 1 | -0/+19 |
|\| | | | | | | | | | | | Conflicts: core/src/main/scala/org/apache/spark/SparkContext.scala project/SparkBuild.scala | ||||
| * | Export StorageLevel and refactor | Aaron Davidson | 2013-09-07 | 1 | -1/+2 |
| | | |||||
| * | SPARK-660: Add StorageLevel support in Python | Aaron Davidson | 2013-09-05 | 1 | -0/+18 |
| | | | | | | | | | | It uses reflection... I am not proud of that fact, but it at least ensures compatibility (sans refactoring of the StorageLevel stuff). | ||||
* | | Merged with master | Prashant Sharma | 2013-09-06 | 1 | -20/+188 |
|\| | |||||
| * | Merge pull request #861 from AndreSchumacher/pyspark_sampling_function | Matei Zaharia | 2013-08-31 | 1 | -7/+55 |
| |\ | | | | | | | Pyspark sampling function | ||||
| | * | RDD sample() and takeSample() prototypes for PySpark | Andre Schumacher | 2013-08-28 | 1 | -7/+55 |
| | | | |||||
| * | | PySpark: implementing subtractByKey(), subtract() and keyBy() | Andre Schumacher | 2013-08-28 | 1 | -0/+37 |
| |/ | |||||
| * | Implementing SPARK-838: Add DoubleRDDFunctions methods to PySpark | Andre Schumacher | 2013-08-21 | 1 | -1/+59 |
| | | |||||
| * | Implementing SPARK-878 for PySpark: adding zip and egg files to context and ↵ | Andre Schumacher | 2013-08-16 | 1 | -1/+3 |
| | | | | | | | | passing it down to workers which add these to their sys.path | ||||
| * | Do not inherit master's PYTHONPATH on workers. | Josh Rosen | 2013-07-29 | 1 | -3/+2 |
| | | | | | | | | | | | | | | | | | | | | | | | | This fixes SPARK-832, an issue where PySpark would not work when the master and workers used different SPARK_HOME paths. This change may potentially break code that relied on the master's PYTHONPATH being used on workers. To have custom PYTHONPATH additions used on the workers, users should set a custom PYTHONPATH in spark-env.sh rather than setting it in the shell. | ||||
| * | Use None instead of empty string as it's slightly smaller/faster | Matei Zaharia | 2013-07-29 | 1 | -1/+1 |
| | | |||||
| * | Optimize Python foreach() to not return as many objects | Matei Zaharia | 2013-07-29 | 1 | -1/+5 |
| | | |||||
| * | Optimize Python take() to not compute entire first partition | Matei Zaharia | 2013-07-29 | 1 | -6/+9 |
| | | |||||
| * | Add Apache license headers and LICENSE and NOTICE files | Matei Zaharia | 2013-07-16 | 1 | -0/+17 |
| | | |||||
* | | PySpark: replacing class manifest by class tag for Scala 2.10.2 inside rdd.py | Andre Schumacher | 2013-08-30 | 1 | -2/+2 |
|/ | |||||
* | Fix Python saveAsTextFile doctest to not expect order to be preserved | Jey Kottalam | 2013-04-02 | 1 | -1/+1 |
| | |||||
* | Change numSplits to numPartitions in PySpark. | Josh Rosen | 2013-02-24 | 1 | -28/+28 |
| | |||||
* | Add commutative requirement for 'reduce' to Python docstring. | Mark Hamstra | 2013-02-09 | 1 | -2/+2 |
| | |||||
* | Fetch fewer objects in PySpark's take() method. | Josh Rosen | 2013-02-03 | 1 | -0/+4 |
| | |||||
* | Fix reporting of PySpark doctest failures. | Josh Rosen | 2013-02-03 | 1 | -1/+3 |
| | |||||
* | Use spark.local.dir for PySpark temp files (SPARK-580). | Josh Rosen | 2013-02-01 | 1 | -6/+1 |
| | |||||
* | Do not launch JavaGateways on workers (SPARK-674). | Josh Rosen | 2013-02-01 | 1 | -6/+6 |
| | | | | | | | | | | | The problem was that the gateway was being initialized whenever the pyspark.context module was loaded. The fix uses lazy initialization that occurs only when SparkContext instances are actually constructed. I also made the gateway and jvm variables private. This change results in ~3-4x performance improvement when running the PySpark unit tests. | ||||
* | Merge pull request #389 from JoshRosen/python_rdd_checkpointing | Matei Zaharia | 2013-01-20 | 1 | -1/+34 |
|\ | | | | | Add checkpointing to the Python API | ||||
| * | Clean up setup code in PySpark checkpointing tests | Josh Rosen | 2013-01-20 | 1 | -2/+1 |
| | | |||||
| * | Update checkpointing API docs in Python/Java. | Josh Rosen | 2013-01-20 | 1 | -12/+5 |
| | | |||||
| * | Add checkpointFile() and more tests to PySpark. | Josh Rosen | 2013-01-20 | 1 | -1/+8 |
| | | |||||
| * | Add RDD checkpointing to Python API. | Josh Rosen | 2013-01-20 | 1 | -0/+34 |
| | | |||||
* | | Fix PythonPartitioner equality; see SPARK-654. | Josh Rosen | 2013-01-20 | 1 | -6/+11 |
|/ | | | | | | PythonPartitioner did not take the Python-side partitioning function into account when checking for equality, which might cause problems in the future. | ||||
* | Added accumulators to PySpark | Matei Zaharia | 2013-01-20 | 1 | -1/+1 |
| | |||||
* | Add mapPartitionsWithSplit() to PySpark. | Josh Rosen | 2013-01-08 | 1 | -11/+22 |
| | |||||
* | Change PySpark RDD.take() to not call iterator(). | Josh Rosen | 2013-01-03 | 1 | -6/+5 |
| | |||||
* | Rename top-level 'pyspark' directory to 'python' | Josh Rosen | 2013-01-01 | 1 | -0/+713 |