Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | Move some classes to more appropriate packages: | Matei Zaharia | 2013-09-01 | 1 | -2/+2 |
| | | | | | | * RDD, *RDDFunctions -> org.apache.spark.rdd * Utils, ClosureCleaner, SizeEstimator -> org.apache.spark.util * JavaSerializer, KryoSerializer -> org.apache.spark.serializer | ||||
* | Add banner to PySpark and make wordcount output nicer | Matei Zaharia | 2013-09-01 | 2 | -1/+14 |
| | |||||
* | Initial work to rename package to org.apache.spark | Matei Zaharia | 2013-09-01 | 3 | -5/+5 |
| | |||||
* | Merge pull request #861 from AndreSchumacher/pyspark_sampling_function | Matei Zaharia | 2013-08-31 | 2 | -7/+167 |
|\ | | | | | Pyspark sampling function | ||||
| * | RDD sample() and takeSample() prototypes for PySpark | Andre Schumacher | 2013-08-28 | 2 | -7/+167 |
| | | |||||
* | | Merge pull request #870 from JoshRosen/spark-885 | Matei Zaharia | 2013-08-31 | 1 | -1/+5 |
|\ \ | | | | | | | Don't send SIGINT / ctrl-c to Py4J gateway subprocess | ||||
| * | | Don't send SIGINT to Py4J gateway subprocess. | Josh Rosen | 2013-08-28 | 1 | -1/+5 |
| |/ | | | | | | | | | | | | | | | | | This addresses SPARK-885, a usability issue where PySpark's Java gateway process would be killed if the user hit ctrl-c. Note that SIGINT still won't cancel the running s This fix is based on http://stackoverflow.com/questions/5045771 | ||||
* | | Merge pull request #869 from AndreSchumacher/subtract | Matei Zaharia | 2013-08-30 | 1 | -0/+37 |
|\ \ | | | | | | | PySpark: implementing subtractByKey(), subtract() and keyBy() | ||||
| * | | PySpark: implementing subtractByKey(), subtract() and keyBy() | Andre Schumacher | 2013-08-28 | 1 | -0/+37 |
| |/ | |||||
* | | Fix PySpark for assembly run and include it in dist | Matei Zaharia | 2013-08-29 | 1 | -0/+0 |
| | | |||||
* | | Change build and run instructions to use assemblies | Matei Zaharia | 2013-08-29 | 1 | -1/+1 |
|/ | | | | | | | | | | | | | | | | This commit makes Spark invocation saner by using an assembly JAR to find all of Spark's dependencies instead of adding all the JARs in lib_managed. It also packages the examples into an assembly and uses that as SPARK_EXAMPLES_JAR. Finally, it replaces the old "run" script with two better-named scripts: "run-examples" for examples, and "spark-class" for Spark internal classes (e.g. REPL, master, etc). This is also designed to minimize the confusion people have in trying to use "run" to run their own classes; it's not meant to do that, but now at least if they look at it, they can modify run-examples to do a decent job for them. As part of this, Bagel's examples are also now properly moved to the examples package instead of bagel. | ||||
* | Implementing SPARK-838: Add DoubleRDDFunctions methods to PySpark | Andre Schumacher | 2013-08-21 | 2 | -1/+168 |
| | |||||
* | Implementing SPARK-878 for PySpark: adding zip and egg files to context and ↵ | Andre Schumacher | 2013-08-16 | 5 | -5/+37 |
| | | | | passing it down to workers which add these to their sys.path | ||||
* | Fix PySpark unit tests on Python 2.6. | Josh Rosen | 2013-08-14 | 2 | -19/+20 |
| | |||||
* | Merge pull request #802 from stayhf/SPARK-760-Python | Matei Zaharia | 2013-08-12 | 1 | -0/+70 |
|\ | | | | | Simple PageRank algorithm implementation in Python for SPARK-760 | ||||
| * | Code update for Matei's suggestions | stayhf | 2013-08-11 | 1 | -7/+9 |
| | | |||||
| * | Simple PageRank algorithm implementation in Python for SPARK-760 | stayhf | 2013-08-10 | 1 | -0/+68 |
| | | |||||
* | | Merge pull request #813 from AndreSchumacher/add_files_pyspark | Matei Zaharia | 2013-08-12 | 1 | -1/+6 |
|\ \ | | | | | | | Implementing SPARK-865: Add the equivalent of ADD_JARS to PySpark | ||||
| * | | Implementing SPARK-865: Add the equivalent of ADD_JARS to PySpark | Andre Schumacher | 2013-08-12 | 1 | -1/+6 |
| | | | | | | | | | | | | Now ADD_FILES uses a comma as file name separator. | ||||
* | | | Merge pull request #747 from mateiz/improved-lr | Matei Zaharia | 2013-08-06 | 1 | -27/+26 |
|\ \ \ | | | | | | | | | Update the Python logistic regression example | ||||
| * | | | Fix string parsing and style in LR | Matei Zaharia | 2013-07-31 | 1 | -1/+1 |
| | | | | |||||
| * | | | Update the Python logistic regression example to read from a file and | Matei Zaharia | 2013-07-29 | 1 | -27/+26 |
| | | | | | | | | | | | | | | | | batch input records for more efficient NumPy computations | ||||
* | | | | Do not inherit master's PYTHONPATH on workers. | Josh Rosen | 2013-07-29 | 1 | -3/+2 |
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes SPARK-832, an issue where PySpark would not work when the master and workers used different SPARK_HOME paths. This change may potentially break code that relied on the master's PYTHONPATH being used on workers. To have custom PYTHONPATH additions used on the workers, users should set a custom PYTHONPATH in spark-env.sh rather than setting it in the shell. | ||||
* | | | Merge branch 'master' of github.com:mesos/spark | Matei Zaharia | 2013-07-29 | 6 | -15/+9 |
|\ \ \ | |||||
| * | | | Some fixes to Python examples (style and package name for LR) | Matei Zaharia | 2013-07-27 | 6 | -15/+9 |
| | |/ | |/| | |||||
* | | | SPARK-815. Python parallelize() should split lists before batching | Matei Zaharia | 2013-07-29 | 1 | -2/+9 |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | One unfortunate consequence of this fix is that we materialize any collections that are given to us as generators, but this seems necessary to get reasonable behavior on small collections. We could add a batchSize parameter later to bypass auto-computation of batch size if this becomes a problem (e.g. if users really want to parallelize big generators nicely) | ||||
* | | | Use None instead of empty string as it's slightly smaller/faster | Matei Zaharia | 2013-07-29 | 1 | -1/+1 |
| | | | |||||
* | | | Allow python/run-tests to run from any directory | Matei Zaharia | 2013-07-29 | 1 | -0/+3 |
| | | | |||||
* | | | Optimize Python foreach() to not return as many objects | Matei Zaharia | 2013-07-29 | 1 | -1/+5 |
| | | | |||||
* | | | Optimize Python take() to not compute entire first partition | Matei Zaharia | 2013-07-29 | 1 | -6/+9 |
|/ / | |||||
* | | Add Apache license headers and LICENSE and NOTICE files | Matei Zaharia | 2013-07-16 | 19 | -1/+325 |
| | | |||||
* | | Fixed PySpark perf regression by not using socket.makefile(), and improved | root | 2013-07-01 | 1 | -18/+24 |
| | | | | | | | | | | | | | | debuggability by letting "print" statements show up in the executor's stderr Conflicts: core/src/main/scala/spark/api/python/PythonRDD.scala | ||||
* | | Fix reporting of PySpark exceptions | Jey Kottalam | 2013-06-21 | 2 | -5/+19 |
| | | |||||
* | | PySpark daemon: fix deadlock, improve error handling | Jey Kottalam | 2013-06-21 | 1 | -17/+50 |
| | | |||||
* | | Add tests and fixes for Python daemon shutdown | Jey Kottalam | 2013-06-21 | 3 | -22/+69 |
| | | |||||
* | | Prefork Python worker processes | Jey Kottalam | 2013-06-21 | 2 | -32/+138 |
| | | |||||
* | | Add Python timing instrumentation | Jey Kottalam | 2013-06-21 | 2 | -1/+19 |
| | | |||||
* | | Fix Python saveAsTextFile doctest to not expect order to be preserved | Jey Kottalam | 2013-04-02 | 1 | -1/+1 |
| | | |||||
* | | Fix argv handling in Python transitive closure example | Jey Kottalam | 2013-04-02 | 1 | -1/+1 |
| | | |||||
* | | Change numSplits to numPartitions in PySpark. | Josh Rosen | 2013-02-24 | 2 | -38/+38 |
| | | |||||
* | | Add commutative requirement for 'reduce' to Python docstring. | Mark Hamstra | 2013-02-09 | 1 | -2/+2 |
|/ | |||||
* | Remove unnecessary doctest __main__ methods. | Josh Rosen | 2013-02-03 | 2 | -18/+0 |
| | |||||
* | Fetch fewer objects in PySpark's take() method. | Josh Rosen | 2013-02-03 | 1 | -0/+4 |
| | |||||
* | Fix reporting of PySpark doctest failures. | Josh Rosen | 2013-02-03 | 2 | -2/+6 |
| | |||||
* | Use spark.local.dir for PySpark temp files (SPARK-580). | Josh Rosen | 2013-02-01 | 2 | -10/+9 |
| | |||||
* | Do not launch JavaGateways on workers (SPARK-674). | Josh Rosen | 2013-02-01 | 4 | -18/+25 |
| | | | | | | | | | | | The problem was that the gateway was being initialized whenever the pyspark.context module was loaded. The fix uses lazy initialization that occurs only when SparkContext instances are actually constructed. I also made the gateway and jvm variables private. This change results in ~3-4x performance improvement when running the PySpark unit tests. | ||||
* | Fix stdout redirection in PySpark. | Josh Rosen | 2013-02-01 | 2 | -2/+12 |
| | |||||
* | SPARK-673: Capture and re-throw Python exceptions | Patrick Wendell | 2013-01-31 | 1 | -2/+8 |
| | | | | | This patch alters the Python <-> executor protocol to pass on exception data when they occur in user Python code. | ||||
* | Merge pull request #430 from pwendell/pyspark-guide | Matei Zaharia | 2013-01-30 | 1 | -0/+1 |
|\ | | | | | Minor improvements to PySpark docs | ||||
| * | Make module help available in python shell. | Patrick Wendell | 2013-01-30 | 1 | -0/+1 |
| | | | | | | | | Also, adds a line in doc explaining how to use. |