spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	Make Python function/line appear in the UI.	Tor Myklebust	2013-12-28	1	-11/+55
\|
*	Merge pull request #276 from shivaram/collectPartition	Reynold Xin	2013-12-19	1	-1/+6
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add collectPartition to JavaRDD interface. This interface is useful for implementing `take` from other language frontends where the data is serialized. Also remove `takePartition` from PythonRDD and use `collectPartition` in rdd.py. Thanks @concretevitamin for the original change and tests.
\| *	Make collectPartitions take an array of partitions	Shivaram Venkataraman	2013-12-19	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \|	Change the implementation to use runJob instead of PartitionPruningRDD. Also update the unit tests and the python take implementation to use the new interface.
\| *	Add collectPartition to JavaRDD interface.	Shivaram Venkataraman	2013-12-18	1	-1/+1
\| \| \| \| \| \| \| \|	Also remove takePartition from PythonRDD and use collectPartition in rdd.py.
* \|	Add toString to Java RDD, and __repr__ to Python RDD	Nick Pentreath	2013-12-19	1	-0/+3
\|/
*	Merge branch 'master' into akka-bug-fix	Prashant Sharma	2013-12-11	1	-1/+4
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/pom.xml core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala pom.xml project/SparkBuild.scala streaming/pom.xml yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala
\| *	Fix UnicodeEncodeError in PySpark saveAsTextFile().	Josh Rosen	2013-11-28	1	-1/+4
\| \| \| \| \| \|	Fixes SPARK-970.
* \|	Merge branch 'master' into wip-scala-2.10	Prashant Sharma	2013-11-27	1	-43/+54
\|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala core/src/main/scala/org/apache/spark/rdd/MapPartitionsRDD.scala core/src/main/scala/org/apache/spark/rdd/MapPartitionsWithContextRDD.scala core/src/main/scala/org/apache/spark/rdd/RDD.scala python/pyspark/rdd.py
\| *	FramedSerializer: _dumps => dumps, _loads => loads.	Josh Rosen	2013-11-10	1	-2/+2
\| \|
\| *	Send PySpark commands as bytes insetad of strings.	Josh Rosen	2013-11-10	1	-6/+6
\| \|
\| *	Add custom serializer support to PySpark.	Josh Rosen	2013-11-10	1	-39/+47
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For now, this only adds MarshalSerializer, but it lays the groundwork for other supporting custom serializers. Many of these mechanisms can also be used to support deserialization of different data formats sent by Java, such as data encoded by MsgPack. This also fixes a bug in SparkContext.union().
\| *	Remove Pickle-wrapping of Java objects in PySpark.	Josh Rosen	2013-11-03	1	-4/+7
\| \| \| \| \| \| \| \| \| \| \| \|	If we support custom serializers, the Python worker will know what type of input to expect, so we won't need to wrap Tuple2 and Strings into pickled tuples and strings.
* \|	Merge branch 'master' of github.com:apache/incubator-spark into scala-2.10	Prashant Sharma	2013-10-10	1	-7/+53
\|\\|
\| *	Fix PySpark docs and an overly long line of code after fdbae41e	Matei Zaharia	2013-10-09	1	-8/+8
\| \|
\| *	SPARK-705: implement sortByKey() in PySpark	Andre Schumacher	2013-10-07	1	-1/+47
\| \|
* \|	Merge branch 'master' into wip-merge-master	Prashant Sharma	2013-10-08	1	-4/+6
\|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: bagel/pom.xml core/pom.xml core/src/test/scala/org/apache/spark/ui/UISuite.scala examples/pom.xml mllib/pom.xml pom.xml project/SparkBuild.scala repl/pom.xml streaming/pom.xml tools/pom.xml In scala 2.10, a shorter representation is used for naming artifacts so changed to shorter scala version for artifacts and made it a property in pom.
\| *	Fixing SPARK-602: PythonPartitioner	Andre Schumacher	2013-10-04	1	-4/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently PythonPartitioner determines partition ID by hashing a byte-array representation of PySpark's key. This PR lets PythonPartitioner use the actual partition ID, which is required e.g. for sorting via PySpark.
* \|	Merge branch 'master' of git://github.com/mesos/spark into scala-2.10	Prashant Sharma	2013-09-15	1	-0/+19
\|\\| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/SparkContext.scala project/SparkBuild.scala
\| *	Export StorageLevel and refactor	Aaron Davidson	2013-09-07	1	-1/+2
\| \|
\| *	SPARK-660: Add StorageLevel support in Python	Aaron Davidson	2013-09-05	1	-0/+18
\| \| \| \| \| \| \| \| \| \|	It uses reflection... I am not proud of that fact, but it at least ensures compatibility (sans refactoring of the StorageLevel stuff).
* \|	Merged with master	Prashant Sharma	2013-09-06	1	-20/+188
\|\\|
\| *	Merge pull request #861 from AndreSchumacher/pyspark_sampling_function	Matei Zaharia	2013-08-31	1	-7/+55
\| \|\ \| \| \| \| \| \|	Pyspark sampling function
\| \| *	RDD sample() and takeSample() prototypes for PySpark	Andre Schumacher	2013-08-28	1	-7/+55
\| \| \|
\| * \|	PySpark: implementing subtractByKey(), subtract() and keyBy()	Andre Schumacher	2013-08-28	1	-0/+37
\| \|/
\| *	Implementing SPARK-838: Add DoubleRDDFunctions methods to PySpark	Andre Schumacher	2013-08-21	1	-1/+59
\| \|
\| *	Implementing SPARK-878 for PySpark: adding zip and egg files to context and ↵	Andre Schumacher	2013-08-16	1	-1/+3
\| \| \| \| \| \| \| \|	passing it down to workers which add these to their sys.path
\| *	Do not inherit master's PYTHONPATH on workers.	Josh Rosen	2013-07-29	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes SPARK-832, an issue where PySpark would not work when the master and workers used different SPARK_HOME paths. This change may potentially break code that relied on the master's PYTHONPATH being used on workers. To have custom PYTHONPATH additions used on the workers, users should set a custom PYTHONPATH in spark-env.sh rather than setting it in the shell.
\| *	Use None instead of empty string as it's slightly smaller/faster	Matei Zaharia	2013-07-29	1	-1/+1
\| \|
\| *	Optimize Python foreach() to not return as many objects	Matei Zaharia	2013-07-29	1	-1/+5
\| \|
\| *	Optimize Python take() to not compute entire first partition	Matei Zaharia	2013-07-29	1	-6/+9
\| \|
\| *	Add Apache license headers and LICENSE and NOTICE files	Matei Zaharia	2013-07-16	1	-0/+17
\| \|
* \|	PySpark: replacing class manifest by class tag for Scala 2.10.2 inside rdd.py	Andre Schumacher	2013-08-30	1	-2/+2
\|/
*	Fix Python saveAsTextFile doctest to not expect order to be preserved	Jey Kottalam	2013-04-02	1	-1/+1
\|
*	Change numSplits to numPartitions in PySpark.	Josh Rosen	2013-02-24	1	-28/+28
\|
*	Add commutative requirement for 'reduce' to Python docstring.	Mark Hamstra	2013-02-09	1	-2/+2
\|
*	Fetch fewer objects in PySpark's take() method.	Josh Rosen	2013-02-03	1	-0/+4
\|
*	Fix reporting of PySpark doctest failures.	Josh Rosen	2013-02-03	1	-1/+3
\|
*	Use spark.local.dir for PySpark temp files (SPARK-580).	Josh Rosen	2013-02-01	1	-6/+1
\|
*	Do not launch JavaGateways on workers (SPARK-674).	Josh Rosen	2013-02-01	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \|	The problem was that the gateway was being initialized whenever the pyspark.context module was loaded. The fix uses lazy initialization that occurs only when SparkContext instances are actually constructed. I also made the gateway and jvm variables private. This change results in ~3-4x performance improvement when running the PySpark unit tests.
*	Merge pull request #389 from JoshRosen/python_rdd_checkpointing	Matei Zaharia	2013-01-20	1	-1/+34
\|\ \| \| \| \|	Add checkpointing to the Python API
\| *	Clean up setup code in PySpark checkpointing tests	Josh Rosen	2013-01-20	1	-2/+1
\| \|
\| *	Update checkpointing API docs in Python/Java.	Josh Rosen	2013-01-20	1	-12/+5
\| \|
\| *	Add checkpointFile() and more tests to PySpark.	Josh Rosen	2013-01-20	1	-1/+8
\| \|
\| *	Add RDD checkpointing to Python API.	Josh Rosen	2013-01-20	1	-0/+34
\| \|
* \|	Fix PythonPartitioner equality; see SPARK-654.	Josh Rosen	2013-01-20	1	-6/+11
\|/ \| \| \| \| \|	PythonPartitioner did not take the Python-side partitioning function into account when checking for equality, which might cause problems in the future.
*	Added accumulators to PySpark	Matei Zaharia	2013-01-20	1	-1/+1
\|
*	Add mapPartitionsWithSplit() to PySpark.	Josh Rosen	2013-01-08	1	-11/+22
\|
*	Change PySpark RDD.take() to not call iterator().	Josh Rosen	2013-01-03	1	-6/+5
\|
*	Rename top-level 'pyspark' directory to 'python'	Josh Rosen	2013-01-01	1	-0/+713