spark - Mirror of Apache Spark

	Commit message (Expand)	Author	Age	Files	Lines
*	[SPARK-3886] [PySpark] simplify serializer, use AutoBatchedSerializer by defa...	Davies Liu	2014-11-03	1	-54/+37
*	[SPARK-4148][PySpark] fix seed distribution and add some tests for rdd.sample	Xiangrui Meng	2014-11-03	1	-3/+0
*	[SPARK-4150][PySpark] return self in rdd.setName	Xiangrui Meng	2014-10-31	1	-2/+2
*	[Spark] RDD take() method: overestimate too much	yingjieMiao	2014-10-13	1	-1/+4
*	[SPARK-3909][PySpark][Doc] A corrupted format in Sphinx documents and buildin...	cocoatomo	2014-10-11	1	-1/+1
*	[SPARK-3412] [PySpark] Replace Epydoc with Sphinx to generate Python API docs	Davies Liu	2014-10-07	1	-26/+26
*	[SPARK-3773][PySpark][Doc] Sphinx build warning	cocoatomo	2014-10-06	1	-0/+1
*	[SPARK-3749] [PySpark] fix bugs in broadcast large closure of RDD	Davies Liu	2014-10-01	1	-3/+9
*	[SPARK-3478] [PySpark] Profile the Python tasks	Davies Liu	2014-09-30	1	-2/+8
*	Revert "[SPARK-3478] [PySpark] Profile the Python tasks"	Josh Rosen	2014-09-26	1	-8/+2
*	[SPARK-3478] [PySpark] Profile the Python tasks	Davies Liu	2014-09-26	1	-2/+8
*	[SPARK-546] Add full outer join to RDD and DStream.	Aaron Staple	2014-09-24	1	-2/+23
*	[SPARK-3491] [MLlib] [PySpark] use pickle to serialize data in MLlib	Davies Liu	2014-09-19	1	-5/+5
*	[SPARK-3554] [PySpark] use broadcast automatically for large closure	Davies Liu	2014-09-18	1	-0/+4
*	[SPARK-3519] add distinct(n) to PySpark	Matthew Farrellee	2014-09-16	1	-2/+2
*	[SPARK-1087] Move python traceback utilities into new traceback_utils.py file.	Aaron Staple	2014-09-15	1	-55/+3
*	[PySpark] Add blank line so that Python RDD.top() docstring renders correctly	RJ Nowling	2014-09-12	1	-0/+1
*	SPARK-2978. Transformation with MR shuffle semantics	Sandy Ryza	2014-09-08	1	-0/+24
*	[SPARK-2334] fix AttributeError when call PipelineRDD.id()	Davies Liu	2014-09-06	1	-0/+6
*	Spark-3406 add a default storage level to python RDD persist API	Holden Karau	2014-09-06	1	-1/+6
*	SPARK-3211 .take() is OOM-prone with empty partitions	Andrew Ash	2014-09-05	1	-4/+4
*	[SPARK-3309] [PySpark] Put all public API in __all__	Davies Liu	2014-09-03	1	-0/+1
*	[SPARK-2871] [PySpark] add countApproxDistinct() API	Davies Liu	2014-09-02	1	-5/+34
*	[SPARK-2871] [PySpark] add RDD.lookup(key)	Davies Liu	2014-08-27	1	-132/+79
*	[SPARK-3073] [PySpark] use external sort in sortBy() and sortByKey()	Davies Liu	2014-08-26	1	-2/+7
*	[SPARK-2871] [PySpark] add histgram() API	Davies Liu	2014-08-26	1	-1/+128
*	[SPARK-2871] [PySpark] add zipWithIndex() and zipWithUniqueId()	Davies Liu	2014-08-24	1	-0/+47
*	[SPARK-2871] [PySpark] add approx API for RDD	Davies Liu	2014-08-23	1	-0/+81
*	[SPARK-2871] [PySpark] add `key` argument for max(), min() and top(n)	Davies Liu	2014-08-23	1	-17/+27
*	[SPARK-3141] [PySpark] fix sortByKey() with take()	Davies Liu	2014-08-19	1	-10/+8
*	[SPARK-2790] [PySpark] fix zip with serializers which have different batch si...	Davies Liu	2014-08-19	1	-0/+25
*	[SPARK-3114] [PySpark] Fix Python UDFs in Spark SQL.	Josh Rosen	2014-08-18	1	-1/+1
*	[SPARK-3103] [PySpark] fix saveAsTextFile() with utf-8	Davies Liu	2014-08-18	1	-1/+3
*	[SPARK-1065] [PySpark] improve supporting for large broadcast	Davies Liu	2014-08-16	1	-2/+3
*	[SPARK-2983] [PySpark] improve performance of sortByKey()	Davies Liu	2014-08-13	1	-23/+24
*	[PySpark] Add blanklines to Python docstrings so example code renders correctly	RJ Nowling	2014-08-06	1	-0/+9
*	[SPARK-2627] [PySpark] have the build enforce PEP 8 automatically	Nicholas Chammas	2014-08-06	1	-9/+13
*	[SPARK-2010] [PySpark] [SQL] support nested structure in SchemaRDD	Davies Liu	2014-08-01	1	-4/+4
*	[SPARK-2024] Add saveAsSequenceFile to PySpark	Kan Zhang	2014-07-30	1	-0/+114
*	[SPARK-2601] [PySpark] Fix Py4J error when transforming pickleFiles	Josh Rosen	2014-07-26	1	-3/+1
*	[SPARK-2656] Python version of stratified sampling	Doris Xin	2014-07-24	1	-2/+23
*	[SPARK-2538] [PySpark] Hash based disk spilling aggregation	Davies Liu	2014-07-24	1	-21/+71
*	[SPARK-2014] Make PySpark store RDDs in MEMORY_ONLY_SER with compression by d...	Prashant Sharma	2014-07-24	1	-2/+2
*	[SPARK-2494] [PySpark] make hash of None consistant cross machines	Davies Liu	2014-07-21	1	-3/+32
*	Made rdd.py pep8 complaint by using Autopep8 and a little manual editing.	Prashant Sharma	2014-07-14	1	-58/+92
*	[SPARK-2061] Made splits deprecated in JavaRDDLike	Anant	2014-06-20	1	-2/+2
*	SPARK-1868: Users should be allowed to cogroup at least 4 RDDs	Allan Douglas R. de Oliveira	2014-06-20	1	-7/+15
*	SPARK-2203: PySpark defaults to use same num reduce partitions as map side	Aaron Davidson	2014-06-20	1	-3/+18
*	SPARK-2146. Fix takeOrdered doc	Sandy Ryza	2014-06-17	1	-1/+1
*	SPARK-1063 Add .sortBy(f) method on RDD	Andrew Ash	2014-06-17	1	-0/+12