Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | Additional tests for MapOutputTracker. | Charles Reiss | 2013-01-14 | 1 | -2/+80 |
| | |||||
* | Throw FetchFailedException for cached missing locs | Charles Reiss | 2013-01-14 | 1 | -10/+26 |
| | |||||
* | Merge branch 'master' of github.com:mesos/spark | Matei Zaharia | 2013-01-13 | 1 | -4/+12 |
|\ | |||||
| * | Merge pull request #360 from rxin/cogroup-java | Matei Zaharia | 2013-01-13 | 1 | -4/+12 |
| |\ | | | | | | | Changed CoGroupRDD's hash map from Scala to Java. | ||||
| | * | Removed the use of getOrElse to avoid Scala wrapper for every call. | Reynold Xin | 2013-01-13 | 1 | -3/+10 |
| | | | |||||
| | * | Changed CoGroupRDD's hash map from Scala to Java. | Reynold Xin | 2013-01-10 | 1 | -2/+3 |
| | | | |||||
* | | | Make filter preserve partitioner info, since it can | Matei Zaharia | 2013-01-13 | 2 | -1/+7 |
|/ / | |||||
* | | Merge pull request #368 from mbautin/add_spray_json_dependency | Matei Zaharia | 2013-01-13 | 2 | -0/+10 |
|\ \ | | | | | | | Add missing dependency spray-json to Maven build | ||||
| * | | Add missing dependency spray-json to Maven build | Mikhail Bautin | 2013-01-13 | 2 | -0/+10 |
|/ / | |||||
* | | Merge pull request #346 from JoshRosen/python-api | Matei Zaharia | 2013-01-12 | 35 | -12/+2985 |
|\ \ | | | | | | | Python API (PySpark) | ||||
| * | | Change PYSPARK_PYTHON_EXEC to PYSPARK_PYTHON. | Josh Rosen | 2013-01-10 | 1 | -1/+1 |
| | | | |||||
| * | | Use take() instead of takeSample() in PySpark kmeans example. | Josh Rosen | 2013-01-09 | 1 | -1/+3 |
| | | | | | | | | | | | | This is a temporary change until we port takeSample(). | ||||
| * | | Indicate success/failure in PySpark test script. | Josh Rosen | 2013-01-09 | 1 | -0/+17 |
| | | | |||||
| * | | Add mapPartitionsWithSplit() to PySpark. | Josh Rosen | 2013-01-08 | 4 | -13/+30 |
| | | | |||||
| * | | Change PySpark RDD.take() to not call iterator(). | Josh Rosen | 2013-01-03 | 3 | -6/+10 |
| | | | |||||
| * | | Add `pyspark` script to replace the other scripts. | Josh Rosen | 2013-01-01 | 6 | -36/+69 |
| | | | | | | | | | Expand the PySpark programming guide. | ||||
| * | | Rename top-level 'pyspark' directory to 'python' | Josh Rosen | 2013-01-01 | 28 | -13/+13 |
| | | | |||||
| * | | Minor documentation and style fixes for PySpark. | Josh Rosen | 2013-01-01 | 10 | -32/+70 |
| | | | |||||
| * | | Launch with `scala` by default in run-pyspark | Josh Rosen | 2012-12-31 | 1 | -0/+5 |
| | | | |||||
| * | | Port LR example to PySpark using numpy. | Josh Rosen | 2012-12-29 | 1 | -0/+57 |
| | | | | | | | | | | | | | | | | | | This version of the example crashes after the first iteration with "OverflowError: math range error" because Python's math.exp() behaves differently than Scala's; see SPARK-646. | ||||
| * | | Add test for pyspark.RDD.saveAsTextFile(). | Josh Rosen | 2012-12-29 | 1 | -1/+8 |
| | | | |||||
| * | | Update PySpark for compatibility with TaskContext. | Josh Rosen | 2012-12-29 | 2 | -9/+7 |
| | | | |||||
| * | | Merge remote-tracking branch 'origin/master' into python-api | Josh Rosen | 2012-12-29 | 124 | -1677/+4394 |
| |\ \ | | | | | | | | | | | | | | | | | Conflicts: docs/quick-start.md | ||||
| * | | | Use batching in pyspark parallelize(); fix cartesian() | Josh Rosen | 2012-12-29 | 3 | -27/+31 |
| | | | | |||||
| * | | | Fix bug in pyspark.serializers.batch; add .gitignore. | Josh Rosen | 2012-12-29 | 3 | -2/+6 |
| | | | | |||||
| * | | | Add documentation for Python API. | Josh Rosen | 2012-12-28 | 12 | -48/+127 |
| | | | | |||||
| * | | | Fix bug (introduced by batching) in PySpark take() | Josh Rosen | 2012-12-28 | 4 | -15/+22 |
| | | | | |||||
| * | | | Mark api.python classes as private; echo Java output to stderr. | Josh Rosen | 2012-12-28 | 3 | -31/+24 |
| | | | | |||||
| * | | | Simplify PySpark installation. | Josh Rosen | 2012-12-27 | 13 | -47/+78 |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Bundle Py4J binaries, since it's hard to install - Uses Spark's `run` script to launch the Py4J gateway, inheriting the settings in spark-env.sh With these changes, (hopefully) nothing more than running `sbt/sbt package` will be necessary to run PySpark. | ||||
| * | | | Use addFile() to ship code to cluster in PySpark. | Josh Rosen | 2012-12-27 | 2 | -10/+74 |
| | | | | | | | | | | | | Add options to pyspark.SparkContext constructor. | ||||
| * | | | Add epydoc API documentation for PySpark. | Josh Rosen | 2012-12-27 | 6 | -19/+254 |
| | | | | |||||
| * | | | Add IPython support to pyspark-shell. | Josh Rosen | 2012-12-27 | 3 | -8/+21 |
| | | | | | | | | | | | | | | | | Suggested by / based on code from @MLnick | ||||
| * | | | Remove debug output from PythonPartitioner. | Josh Rosen | 2012-12-26 | 1 | -2/+0 |
| | | | | |||||
| * | | | Add support for batched serialization of Python objects in PySpark. | Josh Rosen | 2012-12-26 | 3 | -20/+74 |
| | | | | |||||
| * | | | Use filesystem to collect RDDs in PySpark. | Josh Rosen | 2012-12-24 | 5 | -63/+66 |
| | | | | | | | | | | | | | | | | | | | | | | | | Passing large volumes of data through Py4J seems to be slow. It appears to be faster to write the data to the local filesystem and read it back from Python. | ||||
| * | | | Reduce object overhead in Pyspark shuffle and collect | Josh Rosen | 2012-12-24 | 1 | -5/+14 |
| | | | | |||||
| * | | | Fix PySpark hash partitioning bug. | Josh Rosen | 2012-10-28 | 3 | -9/+54 |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A Java array's hashCode is based on its object identify, not its elements, so this was causing serialized keys to be hashed incorrectly. This commit adds a PySpark-specific workaround and adds more tests. | ||||
| * | | | Bump required Py4J version and add test for large broadcast variables. | Josh Rosen | 2012-10-28 | 3 | -2/+4 |
| | | | | |||||
| * | | | Remove PYTHONPATH from SparkContext's executorEnvs. | Josh Rosen | 2012-10-22 | 3 | -11/+14 |
| | | | | | | | | | | | | | | | | | | | | It makes more sense to pass it in the dictionary of environment variables that is used to construct PythonRDD. | ||||
| * | | | Add PySpark README and run scripts. | Josh Rosen | 2012-10-20 | 7 | -4/+125 |
| | | | | |||||
| * | | | Update Python API for v0.6.0 compatibility. | Josh Rosen | 2012-10-19 | 7 | -27/+42 |
| | | | | |||||
| * | | | Merge tag 'v0.6.0' into python-api | Josh Rosen | 2012-10-19 | 264 | -3914/+17506 |
| |\ \ \ | |||||
| * | | | | Fix Python 2.6 compatibility in Python API. | Josh Rosen | 2012-09-17 | 2 | -28/+11 |
| | | | | | |||||
| * | | | | Fix minor bugs in Python API examples. | Josh Rosen | 2012-08-27 | 2 | -5/+5 |
| | | | | | |||||
| * | | | | Add pipe(), saveAsTextFile(), sc.union() to Python API. | Josh Rosen | 2012-08-27 | 3 | -10/+37 |
| | | | | | |||||
| * | | | | Simplify Python worker; pipeline the map step of partitionBy(). | Josh Rosen | 2012-08-27 | 5 | -127/+59 |
| | | | | | |||||
| * | | | | Use local combiners in Python API combineByKey(). | Josh Rosen | 2012-08-27 | 2 | -25/+24 |
| | | | | | |||||
| * | | | | Add countByKey(), reduceByKeyLocally() to Python API | Josh Rosen | 2012-08-27 | 1 | -13/+39 |
| | | | | | |||||
| * | | | | Add mapPartitions(), glom(), countByValue() to Python API. | Josh Rosen | 2012-08-27 | 1 | -4/+28 |
| | | | | | |||||
| * | | | | Add broadcast variables to Python API. | Josh Rosen | 2012-08-27 | 5 | -29/+110 |
| | | | | |