spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge branch 'master' of git://github.com/mesos/spark into scala-2.10	Prashant Sharma	2013-09-15	1	-0/+19
\|\ \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/SparkContext.scala project/SparkBuild.scala
\| *	Export StorageLevel and refactor	Aaron Davidson	2013-09-07	1	-1/+2
\| \|
\| *	SPARK-660: Add StorageLevel support in Python	Aaron Davidson	2013-09-05	1	-0/+18
\| \| \| \| \| \| \| \| \| \|	It uses reflection... I am not proud of that fact, but it at least ensures compatibility (sans refactoring of the StorageLevel stuff).
* \|	Merged with master	Prashant Sharma	2013-09-06	1	-20/+188
\|\\|
\| *	Merge pull request #861 from AndreSchumacher/pyspark_sampling_function	Matei Zaharia	2013-08-31	1	-7/+55
\| \|\ \| \| \| \| \| \|	Pyspark sampling function
\| \| *	RDD sample() and takeSample() prototypes for PySpark	Andre Schumacher	2013-08-28	1	-7/+55
\| \| \|
\| * \|	PySpark: implementing subtractByKey(), subtract() and keyBy()	Andre Schumacher	2013-08-28	1	-0/+37
\| \|/
\| *	Implementing SPARK-838: Add DoubleRDDFunctions methods to PySpark	Andre Schumacher	2013-08-21	1	-1/+59
\| \|
\| *	Implementing SPARK-878 for PySpark: adding zip and egg files to context and ↵	Andre Schumacher	2013-08-16	1	-1/+3
\| \| \| \| \| \| \| \|	passing it down to workers which add these to their sys.path
\| *	Do not inherit master's PYTHONPATH on workers.	Josh Rosen	2013-07-29	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes SPARK-832, an issue where PySpark would not work when the master and workers used different SPARK_HOME paths. This change may potentially break code that relied on the master's PYTHONPATH being used on workers. To have custom PYTHONPATH additions used on the workers, users should set a custom PYTHONPATH in spark-env.sh rather than setting it in the shell.
\| *	Use None instead of empty string as it's slightly smaller/faster	Matei Zaharia	2013-07-29	1	-1/+1
\| \|
\| *	Optimize Python foreach() to not return as many objects	Matei Zaharia	2013-07-29	1	-1/+5
\| \|
\| *	Optimize Python take() to not compute entire first partition	Matei Zaharia	2013-07-29	1	-6/+9
\| \|
\| *	Add Apache license headers and LICENSE and NOTICE files	Matei Zaharia	2013-07-16	1	-0/+17
\| \|
* \|	PySpark: replacing class manifest by class tag for Scala 2.10.2 inside rdd.py	Andre Schumacher	2013-08-30	1	-2/+2
\|/
*	Fix Python saveAsTextFile doctest to not expect order to be preserved	Jey Kottalam	2013-04-02	1	-1/+1
\|
*	Change numSplits to numPartitions in PySpark.	Josh Rosen	2013-02-24	1	-28/+28
\|
*	Add commutative requirement for 'reduce' to Python docstring.	Mark Hamstra	2013-02-09	1	-2/+2
\|
*	Fetch fewer objects in PySpark's take() method.	Josh Rosen	2013-02-03	1	-0/+4
\|
*	Fix reporting of PySpark doctest failures.	Josh Rosen	2013-02-03	1	-1/+3
\|
*	Use spark.local.dir for PySpark temp files (SPARK-580).	Josh Rosen	2013-02-01	1	-6/+1
\|
*	Do not launch JavaGateways on workers (SPARK-674).	Josh Rosen	2013-02-01	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \|	The problem was that the gateway was being initialized whenever the pyspark.context module was loaded. The fix uses lazy initialization that occurs only when SparkContext instances are actually constructed. I also made the gateway and jvm variables private. This change results in ~3-4x performance improvement when running the PySpark unit tests.
*	Merge pull request #389 from JoshRosen/python_rdd_checkpointing	Matei Zaharia	2013-01-20	1	-1/+34
\|\ \| \| \| \|	Add checkpointing to the Python API
\| *	Clean up setup code in PySpark checkpointing tests	Josh Rosen	2013-01-20	1	-2/+1
\| \|
\| *	Update checkpointing API docs in Python/Java.	Josh Rosen	2013-01-20	1	-12/+5
\| \|
\| *	Add checkpointFile() and more tests to PySpark.	Josh Rosen	2013-01-20	1	-1/+8
\| \|
\| *	Add RDD checkpointing to Python API.	Josh Rosen	2013-01-20	1	-0/+34
\| \|
* \|	Fix PythonPartitioner equality; see SPARK-654.	Josh Rosen	2013-01-20	1	-6/+11
\|/ \| \| \| \| \|	PythonPartitioner did not take the Python-side partitioning function into account when checking for equality, which might cause problems in the future.
*	Added accumulators to PySpark	Matei Zaharia	2013-01-20	1	-1/+1
\|
*	Add mapPartitionsWithSplit() to PySpark.	Josh Rosen	2013-01-08	1	-11/+22
\|
*	Change PySpark RDD.take() to not call iterator().	Josh Rosen	2013-01-03	1	-6/+5
\|
*	Rename top-level 'pyspark' directory to 'python'	Josh Rosen	2013-01-01	1	-0/+713