spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	Optimize Python foreach() to not return as many objects	Matei Zaharia	2013-07-29	1	-1/+5
\|
*	Optimize Python take() to not compute entire first partition	Matei Zaharia	2013-07-29	1	-6/+9
\|
*	Add Apache license headers and LICENSE and NOTICE files	Matei Zaharia	2013-07-16	11	-0/+187
\|
*	Fixed PySpark perf regression by not using socket.makefile(), and improved	root	2013-07-01	1	-18/+24
\| \| \| \| \| \| \|	debuggability by letting "print" statements show up in the executor's stderr Conflicts: core/src/main/scala/spark/api/python/PythonRDD.scala
*	Fix reporting of PySpark exceptions	Jey Kottalam	2013-06-21	2	-5/+19
\|
*	PySpark daemon: fix deadlock, improve error handling	Jey Kottalam	2013-06-21	1	-17/+50
\|
*	Add tests and fixes for Python daemon shutdown	Jey Kottalam	2013-06-21	3	-22/+69
\|
*	Prefork Python worker processes	Jey Kottalam	2013-06-21	2	-32/+138
\|
*	Add Python timing instrumentation	Jey Kottalam	2013-06-21	2	-1/+19
\|
*	Fix Python saveAsTextFile doctest to not expect order to be preserved	Jey Kottalam	2013-04-02	1	-1/+1
\|
*	Change numSplits to numPartitions in PySpark.	Josh Rosen	2013-02-24	2	-38/+38
\|
*	Add commutative requirement for 'reduce' to Python docstring.	Mark Hamstra	2013-02-09	1	-2/+2
\|
*	Remove unnecessary doctest __main__ methods.	Josh Rosen	2013-02-03	2	-18/+0
\|
*	Fetch fewer objects in PySpark's take() method.	Josh Rosen	2013-02-03	1	-0/+4
\|
*	Fix reporting of PySpark doctest failures.	Josh Rosen	2013-02-03	2	-2/+6
\|
*	Use spark.local.dir for PySpark temp files (SPARK-580).	Josh Rosen	2013-02-01	2	-10/+9
\|
*	Do not launch JavaGateways on workers (SPARK-674).	Josh Rosen	2013-02-01	4	-18/+25
\| \| \| \| \| \| \| \| \| \| \|	The problem was that the gateway was being initialized whenever the pyspark.context module was loaded. The fix uses lazy initialization that occurs only when SparkContext instances are actually constructed. I also made the gateway and jvm variables private. This change results in ~3-4x performance improvement when running the PySpark unit tests.
*	Fix stdout redirection in PySpark.	Josh Rosen	2013-02-01	2	-2/+12
\|
*	SPARK-673: Capture and re-throw Python exceptions	Patrick Wendell	2013-01-31	1	-2/+8
\| \| \| \| \|	This patch alters the Python <-> executor protocol to pass on exception data when they occur in user Python code.
*	Merge pull request #430 from pwendell/pyspark-guide	Matei Zaharia	2013-01-30	1	-0/+1
\|\ \| \| \| \|	Minor improvements to PySpark docs
\| *	Make module help available in python shell.	Patrick Wendell	2013-01-30	1	-0/+1
\| \| \| \| \| \| \| \|	Also, adds a line in doc explaining how to use.
* \|	Replace old 'master' term with 'driver'.	Stephen Haberman	2013-01-25	1	-1/+1
\| \|
* \|	Merge pull request #396 from JoshRosen/spark-653	Matei Zaharia	2013-01-24	2	-14/+29
\|\ \ \| \| \| \| \| \|	Make PySpark AccumulatorParam an abstract base class
\| * \|	Remove use of abc.ABCMeta due to cloudpickle issue.	Josh Rosen	2013-01-23	1	-7/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	cloudpickle runs into issues while pickling subclasses of AccumulatorParam, which may be related to this Python issue: http://bugs.python.org/issue7689 This seems hard to fix and the ABCMeta wasn't necessary, so I removed it.
\| * \|	Make AccumulatorParam an abstract base class.	Josh Rosen	2013-01-21	2	-13/+31
\| \| \|
* \| \|	Allow PySpark's SparkFiles to be used from driver	Josh Rosen	2013-01-23	4	-9/+62
\| \| \| \| \| \| \| \| \| \| \| \|	Fix minor documentation formatting issues.
* \| \|	Fix sys.path bug in PySpark SparkContext.addPyFile	Josh Rosen	2013-01-22	3	-7/+34
\| \| \|
* \| \|	Don't download files to master's working directory.	Josh Rosen	2013-01-21	4	-5/+67
\|/ / \| \| \| \| \| \| \| \| \| \| \| \|	This should avoid exceptions caused by existing files with different contents. I also removed some unused code.
* \|	Merge pull request #389 from JoshRosen/python_rdd_checkpointing	Matei Zaharia	2013-01-20	3	-2/+112
\|\ \ \| \| \| \| \| \|	Add checkpointing to the Python API
\| * \|	Clean up setup code in PySpark checkpointing tests	Josh Rosen	2013-01-20	2	-16/+6
\| \| \|
\| * \|	Update checkpointing API docs in Python/Java.	Josh Rosen	2013-01-20	2	-16/+12
\| \| \|
\| * \|	Add checkpointFile() and more tests to PySpark.	Josh Rosen	2013-01-20	3	-2/+37
\| \| \|
\| * \|	Add RDD checkpointing to Python API.	Josh Rosen	2013-01-20	3	-0/+89
\| \| \|
* \| \|	Fix PythonPartitioner equality; see SPARK-654.	Josh Rosen	2013-01-20	1	-6/+11
\|/ / \| \| \| \| \| \| \| \| \| \|	PythonPartitioner did not take the Python-side partitioning function into account when checking for equality, which might cause problems in the future.
* /	Add __repr__ to Accumulator; fix bug in sc.accumulator	Josh Rosen	2013-01-20	1	-1/+10
\|/
*	Add a class comment to Accumulator	Matei Zaharia	2013-01-20	1	-0/+12
\|
*	Added accumulators to PySpark	Matei Zaharia	2013-01-20	7	-5/+223
\|
*	Change PYSPARK_PYTHON_EXEC to PYSPARK_PYTHON.	Josh Rosen	2013-01-10	1	-1/+1
\|
*	Add mapPartitionsWithSplit() to PySpark.	Josh Rosen	2013-01-08	2	-12/+25
\|
*	Change PySpark RDD.take() to not call iterator().	Josh Rosen	2013-01-03	2	-6/+6
\|
*	Add `pyspark` script to replace the other scripts.	Josh Rosen	2013-01-01	1	-26/+10
\| \| \|	Expand the PySpark programming guide.
*	Rename top-level 'pyspark' directory to 'python'	Josh Rosen	2013-01-01	10	-0/+2194