Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | More doc improvements + better warnings when you haven't built Spark | Matei Zaharia | 2013-08-30 | 1 | -1/+1 |
| | |||||
* | Don't use SPARK_LAUNCH_WITH_SCALA in pyspark | Matei Zaharia | 2013-08-29 | 1 | -5/+0 |
| | |||||
* | Find assembly correctly in pyspark | Matei Zaharia | 2013-08-29 | 1 | -1/+3 |
| | |||||
* | Fix PySpark for assembly run and include it in dist | Matei Zaharia | 2013-08-29 | 1 | -4/+8 |
| | |||||
* | Two fixes to IPython support: | Matei Zaharia | 2013-07-28 | 1 | -3/+7 |
| | | | | | | - Don't attempt to run worker processes with ipython (that can cause some crashes as ipython prints things to standard out) - Allow passing some IPYTHON_OPTS to launch things like the notebook | ||||
* | Add Apache license headers and LICENSE and NOTICE files | Matei Zaharia | 2013-07-16 | 1 | -0/+17 |
| | |||||
* | Adding IPYTHON environment variable support for launching pyspark using ↵ | Nick Pentreath | 2013-02-07 | 1 | -1/+6 |
| | | | | ipython shell | ||||
* | Warn users if they run pyspark or spark-shell without compiling Spark | Matei Zaharia | 2013-01-17 | 1 | -0/+7 |
| | |||||
* | Add `pyspark` script to replace the other scripts. | Josh Rosen | 2013-01-01 | 1 | -0/+32 |
| | | | Expand the PySpark programming guide. | ||||
* | Rename top-level 'pyspark' directory to 'python' | Josh Rosen | 2013-01-01 | 23 | -2473/+0 |
| | |||||
* | Minor documentation and style fixes for PySpark. | Josh Rosen | 2013-01-01 | 6 | -13/+31 |
| | |||||
* | Launch with `scala` by default in run-pyspark | Josh Rosen | 2012-12-31 | 1 | -0/+5 |
| | |||||
* | Port LR example to PySpark using numpy. | Josh Rosen | 2012-12-29 | 1 | -0/+57 |
| | | | | | | This version of the example crashes after the first iteration with "OverflowError: math range error" because Python's math.exp() behaves differently than Scala's; see SPARK-646. | ||||
* | Add test for pyspark.RDD.saveAsTextFile(). | Josh Rosen | 2012-12-29 | 1 | -1/+8 |
| | |||||
* | Update PySpark for compatibility with TaskContext. | Josh Rosen | 2012-12-29 | 1 | -1/+2 |
| | |||||
* | Use batching in pyspark parallelize(); fix cartesian() | Josh Rosen | 2012-12-29 | 3 | -27/+31 |
| | |||||
* | Fix bug in pyspark.serializers.batch; add .gitignore. | Josh Rosen | 2012-12-29 | 3 | -2/+6 |
| | |||||
* | Add documentation for Python API. | Josh Rosen | 2012-12-28 | 7 | -42/+6 |
| | |||||
* | Fix bug (introduced by batching) in PySpark take() | Josh Rosen | 2012-12-28 | 3 | -14/+21 |
| | |||||
* | Mark api.python classes as private; echo Java output to stderr. | Josh Rosen | 2012-12-28 | 1 | -1/+2 |
| | |||||
* | Simplify PySpark installation. | Josh Rosen | 2012-12-27 | 11 | -47/+72 |
| | | | | | | | | | - Bundle Py4J binaries, since it's hard to install - Uses Spark's `run` script to launch the Py4J gateway, inheriting the settings in spark-env.sh With these changes, (hopefully) nothing more than running `sbt/sbt package` will be necessary to run PySpark. | ||||
* | Use addFile() to ship code to cluster in PySpark. | Josh Rosen | 2012-12-27 | 2 | -10/+74 |
| | | | Add options to pyspark.SparkContext constructor. | ||||
* | Add epydoc API documentation for PySpark. | Josh Rosen | 2012-12-27 | 3 | -14/+224 |
| | |||||
* | Add IPython support to pyspark-shell. | Josh Rosen | 2012-12-27 | 3 | -8/+21 |
| | | | | Suggested by / based on code from @MLnick | ||||
* | Add support for batched serialization of Python objects in PySpark. | Josh Rosen | 2012-12-26 | 3 | -20/+74 |
| | |||||
* | Use filesystem to collect RDDs in PySpark. | Josh Rosen | 2012-12-24 | 4 | -21/+42 |
| | | | | | | Passing large volumes of data through Py4J seems to be slow. It appears to be faster to write the data to the local filesystem and read it back from Python. | ||||
* | Reduce object overhead in Pyspark shuffle and collect | Josh Rosen | 2012-12-24 | 1 | -5/+14 |
| | |||||
* | Fix PySpark hash partitioning bug. | Josh Rosen | 2012-10-28 | 1 | -3/+9 |
| | | | | | | | | A Java array's hashCode is based on its object identify, not its elements, so this was causing serialized keys to be hashed incorrectly. This commit adds a PySpark-specific workaround and adds more tests. | ||||
* | Bump required Py4J version and add test for large broadcast variables. | Josh Rosen | 2012-10-28 | 3 | -2/+4 |
| | |||||
* | Remove PYTHONPATH from SparkContext's executorEnvs. | Josh Rosen | 2012-10-22 | 1 | -2/+6 |
| | | | | | It makes more sense to pass it in the dictionary of environment variables that is used to construct PythonRDD. | ||||
* | Add PySpark README and run scripts. | Josh Rosen | 2012-10-20 | 6 | -3/+124 |
| | |||||
* | Update Python API for v0.6.0 compatibility. | Josh Rosen | 2012-10-19 | 5 | -19/+30 |
| | |||||
* | Fix Python 2.6 compatibility in Python API. | Josh Rosen | 2012-09-17 | 1 | -6/+11 |
| | |||||
* | Fix minor bugs in Python API examples. | Josh Rosen | 2012-08-27 | 2 | -5/+5 |
| | |||||
* | Add pipe(), saveAsTextFile(), sc.union() to Python API. | Josh Rosen | 2012-08-27 | 2 | -8/+31 |
| | |||||
* | Simplify Python worker; pipeline the map step of partitionBy(). | Josh Rosen | 2012-08-27 | 4 | -100/+52 |
| | |||||
* | Use local combiners in Python API combineByKey(). | Josh Rosen | 2012-08-27 | 2 | -25/+24 |
| | |||||
* | Add countByKey(), reduceByKeyLocally() to Python API | Josh Rosen | 2012-08-27 | 1 | -13/+39 |
| | |||||
* | Add mapPartitions(), glom(), countByValue() to Python API. | Josh Rosen | 2012-08-27 | 1 | -4/+28 |
| | |||||
* | Add broadcast variables to Python API. | Josh Rosen | 2012-08-27 | 4 | -12/+84 |
| | |||||
* | Implement fold() in Python API. | Josh Rosen | 2012-08-27 | 1 | -1/+19 |
| | |||||
* | Refactor Python MappedRDD to use iterator pipelines. | Josh Rosen | 2012-08-24 | 2 | -97/+41 |
| | |||||
* | Fix options parsing in Python pi example. | Josh Rosen | 2012-08-24 | 1 | -1/+1 |
| | |||||
* | Use numpy in Python k-means example. | Josh Rosen | 2012-08-22 | 3 | -26/+14 |
| | |||||
* | Use only cPickle for serialization in Python API. | Josh Rosen | 2012-08-21 | 6 | -560/+233 |
| | | | | | Objects serialized with JSON can be compared for equality, but JSON can be slow to serialize and only supports a limited range of data types. | ||||
* | Bundle cloudpickle with pyspark. | Josh Rosen | 2012-08-19 | 4 | -5/+976 |
| | |||||
* | Add Python API. | Josh Rosen | 2012-08-18 | 12 | -0/+1170 |