aboutsummaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* Additional tests for MapOutputTracker.Charles Reiss2013-01-141-2/+80
|
* Throw FetchFailedException for cached missing locsCharles Reiss2013-01-141-10/+26
|
* Merge branch 'master' of github.com:mesos/sparkMatei Zaharia2013-01-131-4/+12
|\
| * Merge pull request #360 from rxin/cogroup-javaMatei Zaharia2013-01-131-4/+12
| |\ | | | | | | Changed CoGroupRDD's hash map from Scala to Java.
| | * Removed the use of getOrElse to avoid Scala wrapper for every call.Reynold Xin2013-01-131-3/+10
| | |
| | * Changed CoGroupRDD's hash map from Scala to Java.Reynold Xin2013-01-101-2/+3
| | |
* | | Make filter preserve partitioner info, since it canMatei Zaharia2013-01-132-1/+7
|/ /
* | Merge pull request #368 from mbautin/add_spray_json_dependencyMatei Zaharia2013-01-132-0/+10
|\ \ | | | | | | Add missing dependency spray-json to Maven build
| * | Add missing dependency spray-json to Maven buildMikhail Bautin2013-01-132-0/+10
|/ /
* | Merge pull request #346 from JoshRosen/python-apiMatei Zaharia2013-01-1235-12/+2985
|\ \ | | | | | | Python API (PySpark)
| * | Change PYSPARK_PYTHON_EXEC to PYSPARK_PYTHON.Josh Rosen2013-01-101-1/+1
| | |
| * | Use take() instead of takeSample() in PySpark kmeans example.Josh Rosen2013-01-091-1/+3
| | | | | | | | | | | | This is a temporary change until we port takeSample().
| * | Indicate success/failure in PySpark test script.Josh Rosen2013-01-091-0/+17
| | |
| * | Add mapPartitionsWithSplit() to PySpark.Josh Rosen2013-01-084-13/+30
| | |
| * | Change PySpark RDD.take() to not call iterator().Josh Rosen2013-01-033-6/+10
| | |
| * | Add `pyspark` script to replace the other scripts.Josh Rosen2013-01-016-36/+69
| | | | | | | | | Expand the PySpark programming guide.
| * | Rename top-level 'pyspark' directory to 'python'Josh Rosen2013-01-0128-13/+13
| | |
| * | Minor documentation and style fixes for PySpark.Josh Rosen2013-01-0110-32/+70
| | |
| * | Launch with `scala` by default in run-pysparkJosh Rosen2012-12-311-0/+5
| | |
| * | Port LR example to PySpark using numpy.Josh Rosen2012-12-291-0/+57
| | | | | | | | | | | | | | | | | | This version of the example crashes after the first iteration with "OverflowError: math range error" because Python's math.exp() behaves differently than Scala's; see SPARK-646.
| * | Add test for pyspark.RDD.saveAsTextFile().Josh Rosen2012-12-291-1/+8
| | |
| * | Update PySpark for compatibility with TaskContext.Josh Rosen2012-12-292-9/+7
| | |
| * | Merge remote-tracking branch 'origin/master' into python-apiJosh Rosen2012-12-29124-1677/+4394
| |\ \ | | | | | | | | | | | | | | | | Conflicts: docs/quick-start.md
| * | | Use batching in pyspark parallelize(); fix cartesian()Josh Rosen2012-12-293-27/+31
| | | |
| * | | Fix bug in pyspark.serializers.batch; add .gitignore.Josh Rosen2012-12-293-2/+6
| | | |
| * | | Add documentation for Python API.Josh Rosen2012-12-2812-48/+127
| | | |
| * | | Fix bug (introduced by batching) in PySpark take()Josh Rosen2012-12-284-15/+22
| | | |
| * | | Mark api.python classes as private; echo Java output to stderr.Josh Rosen2012-12-283-31/+24
| | | |
| * | | Simplify PySpark installation.Josh Rosen2012-12-2713-47/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Bundle Py4J binaries, since it's hard to install - Uses Spark's `run` script to launch the Py4J gateway, inheriting the settings in spark-env.sh With these changes, (hopefully) nothing more than running `sbt/sbt package` will be necessary to run PySpark.
| * | | Use addFile() to ship code to cluster in PySpark.Josh Rosen2012-12-272-10/+74
| | | | | | | | | | | | Add options to pyspark.SparkContext constructor.
| * | | Add epydoc API documentation for PySpark.Josh Rosen2012-12-276-19/+254
| | | |
| * | | Add IPython support to pyspark-shell.Josh Rosen2012-12-273-8/+21
| | | | | | | | | | | | | | | | Suggested by / based on code from @MLnick
| * | | Remove debug output from PythonPartitioner.Josh Rosen2012-12-261-2/+0
| | | |
| * | | Add support for batched serialization of Python objects in PySpark.Josh Rosen2012-12-263-20/+74
| | | |
| * | | Use filesystem to collect RDDs in PySpark.Josh Rosen2012-12-245-63/+66
| | | | | | | | | | | | | | | | | | | | | | | | Passing large volumes of data through Py4J seems to be slow. It appears to be faster to write the data to the local filesystem and read it back from Python.
| * | | Reduce object overhead in Pyspark shuffle and collectJosh Rosen2012-12-241-5/+14
| | | |
| * | | Fix PySpark hash partitioning bug.Josh Rosen2012-10-283-9/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A Java array's hashCode is based on its object identify, not its elements, so this was causing serialized keys to be hashed incorrectly. This commit adds a PySpark-specific workaround and adds more tests.
| * | | Bump required Py4J version and add test for large broadcast variables.Josh Rosen2012-10-283-2/+4
| | | |
| * | | Remove PYTHONPATH from SparkContext's executorEnvs.Josh Rosen2012-10-223-11/+14
| | | | | | | | | | | | | | | | | | | | It makes more sense to pass it in the dictionary of environment variables that is used to construct PythonRDD.
| * | | Add PySpark README and run scripts.Josh Rosen2012-10-207-4/+125
| | | |
| * | | Update Python API for v0.6.0 compatibility.Josh Rosen2012-10-197-27/+42
| | | |
| * | | Merge tag 'v0.6.0' into python-apiJosh Rosen2012-10-19264-3914/+17506
| |\ \ \
| * | | | Fix Python 2.6 compatibility in Python API.Josh Rosen2012-09-172-28/+11
| | | | |
| * | | | Fix minor bugs in Python API examples.Josh Rosen2012-08-272-5/+5
| | | | |
| * | | | Add pipe(), saveAsTextFile(), sc.union() to Python API.Josh Rosen2012-08-273-10/+37
| | | | |
| * | | | Simplify Python worker; pipeline the map step of partitionBy().Josh Rosen2012-08-275-127/+59
| | | | |
| * | | | Use local combiners in Python API combineByKey().Josh Rosen2012-08-272-25/+24
| | | | |
| * | | | Add countByKey(), reduceByKeyLocally() to Python APIJosh Rosen2012-08-271-13/+39
| | | | |
| * | | | Add mapPartitions(), glom(), countByValue() to Python API.Josh Rosen2012-08-271-4/+28
| | | | |
| * | | | Add broadcast variables to Python API.Josh Rosen2012-08-275-29/+110
| | | | |