aboutsummaryrefslogtreecommitdiff
path: root/pyspark
Commit message (Expand)AuthorAgeFilesLines
* Add Apache license headers and LICENSE and NOTICE filesMatei Zaharia2013-07-161-0/+17
* Adding IPYTHON environment variable support for launching pyspark using ipyth...Nick Pentreath2013-02-071-1/+6
* Warn users if they run pyspark or spark-shell without compiling SparkMatei Zaharia2013-01-171-0/+7
* Add `pyspark` script to replace the other scripts.Josh Rosen2013-01-011-0/+32
* Rename top-level 'pyspark' directory to 'python'Josh Rosen2013-01-0123-2473/+0
* Minor documentation and style fixes for PySpark.Josh Rosen2013-01-016-13/+31
* Launch with `scala` by default in run-pysparkJosh Rosen2012-12-311-0/+5
* Port LR example to PySpark using numpy.Josh Rosen2012-12-291-0/+57
* Add test for pyspark.RDD.saveAsTextFile().Josh Rosen2012-12-291-1/+8
* Update PySpark for compatibility with TaskContext.Josh Rosen2012-12-291-1/+2
* Use batching in pyspark parallelize(); fix cartesian()Josh Rosen2012-12-293-27/+31
* Fix bug in pyspark.serializers.batch; add .gitignore.Josh Rosen2012-12-293-2/+6
* Add documentation for Python API.Josh Rosen2012-12-287-42/+6
* Fix bug (introduced by batching) in PySpark take()Josh Rosen2012-12-283-14/+21
* Mark api.python classes as private; echo Java output to stderr.Josh Rosen2012-12-281-1/+2
* Simplify PySpark installation.Josh Rosen2012-12-2711-47/+72
* Use addFile() to ship code to cluster in PySpark.Josh Rosen2012-12-272-10/+74
* Add epydoc API documentation for PySpark.Josh Rosen2012-12-273-14/+224
* Add IPython support to pyspark-shell.Josh Rosen2012-12-273-8/+21
* Add support for batched serialization of Python objects in PySpark.Josh Rosen2012-12-263-20/+74
* Use filesystem to collect RDDs in PySpark.Josh Rosen2012-12-244-21/+42
* Reduce object overhead in Pyspark shuffle and collectJosh Rosen2012-12-241-5/+14
* Fix PySpark hash partitioning bug.Josh Rosen2012-10-281-3/+9
* Bump required Py4J version and add test for large broadcast variables.Josh Rosen2012-10-283-2/+4
* Remove PYTHONPATH from SparkContext's executorEnvs.Josh Rosen2012-10-221-2/+6
* Add PySpark README and run scripts.Josh Rosen2012-10-206-3/+124
* Update Python API for v0.6.0 compatibility.Josh Rosen2012-10-195-19/+30
* Fix Python 2.6 compatibility in Python API.Josh Rosen2012-09-171-6/+11
* Fix minor bugs in Python API examples.Josh Rosen2012-08-272-5/+5
* Add pipe(), saveAsTextFile(), sc.union() to Python API.Josh Rosen2012-08-272-8/+31
* Simplify Python worker; pipeline the map step of partitionBy().Josh Rosen2012-08-274-100/+52
* Use local combiners in Python API combineByKey().Josh Rosen2012-08-272-25/+24
* Add countByKey(), reduceByKeyLocally() to Python APIJosh Rosen2012-08-271-13/+39
* Add mapPartitions(), glom(), countByValue() to Python API.Josh Rosen2012-08-271-4/+28
* Add broadcast variables to Python API.Josh Rosen2012-08-274-12/+84
* Implement fold() in Python API.Josh Rosen2012-08-271-1/+19
* Refactor Python MappedRDD to use iterator pipelines.Josh Rosen2012-08-242-97/+41
* Fix options parsing in Python pi example.Josh Rosen2012-08-241-1/+1
* Use numpy in Python k-means example.Josh Rosen2012-08-223-26/+14
* Use only cPickle for serialization in Python API.Josh Rosen2012-08-216-560/+233
* Bundle cloudpickle with pyspark.Josh Rosen2012-08-194-5/+976
* Add Python API.Josh Rosen2012-08-1812-0/+1170