spark - Mirror of Apache Spark

	Commit message (Expand)	Author	Age	Files	Lines
...
*	[SPARK-2334] fix AttributeError when call PipelineRDD.id()	Davies Liu	2014-09-06	1	-0/+6
*	Spark-3406 add a default storage level to python RDD persist API	Holden Karau	2014-09-06	1	-1/+6
*	SPARK-3211 .take() is OOM-prone with empty partitions	Andrew Ash	2014-09-05	1	-4/+4
*	[SPARK-3309] [PySpark] Put all public API in __all__	Davies Liu	2014-09-03	1	-0/+1
*	[SPARK-2871] [PySpark] add countApproxDistinct() API	Davies Liu	2014-09-02	1	-5/+34
*	[SPARK-2871] [PySpark] add RDD.lookup(key)	Davies Liu	2014-08-27	1	-132/+79
*	[SPARK-3073] [PySpark] use external sort in sortBy() and sortByKey()	Davies Liu	2014-08-26	1	-2/+7
*	[SPARK-2871] [PySpark] add histgram() API	Davies Liu	2014-08-26	1	-1/+128
*	[SPARK-2871] [PySpark] add zipWithIndex() and zipWithUniqueId()	Davies Liu	2014-08-24	1	-0/+47
*	[SPARK-2871] [PySpark] add approx API for RDD	Davies Liu	2014-08-23	1	-0/+81
*	[SPARK-2871] [PySpark] add `key` argument for max(), min() and top(n)	Davies Liu	2014-08-23	1	-17/+27
*	[SPARK-3141] [PySpark] fix sortByKey() with take()	Davies Liu	2014-08-19	1	-10/+8
*	[SPARK-2790] [PySpark] fix zip with serializers which have different batch si...	Davies Liu	2014-08-19	1	-0/+25
*	[SPARK-3114] [PySpark] Fix Python UDFs in Spark SQL.	Josh Rosen	2014-08-18	1	-1/+1
*	[SPARK-3103] [PySpark] fix saveAsTextFile() with utf-8	Davies Liu	2014-08-18	1	-1/+3
*	[SPARK-1065] [PySpark] improve supporting for large broadcast	Davies Liu	2014-08-16	1	-2/+3
*	[SPARK-2983] [PySpark] improve performance of sortByKey()	Davies Liu	2014-08-13	1	-23/+24
*	[PySpark] Add blanklines to Python docstrings so example code renders correctly	RJ Nowling	2014-08-06	1	-0/+9
*	[SPARK-2627] [PySpark] have the build enforce PEP 8 automatically	Nicholas Chammas	2014-08-06	1	-9/+13
*	[SPARK-2010] [PySpark] [SQL] support nested structure in SchemaRDD	Davies Liu	2014-08-01	1	-4/+4
*	[SPARK-2024] Add saveAsSequenceFile to PySpark	Kan Zhang	2014-07-30	1	-0/+114
*	[SPARK-2601] [PySpark] Fix Py4J error when transforming pickleFiles	Josh Rosen	2014-07-26	1	-3/+1
*	[SPARK-2656] Python version of stratified sampling	Doris Xin	2014-07-24	1	-2/+23
*	[SPARK-2538] [PySpark] Hash based disk spilling aggregation	Davies Liu	2014-07-24	1	-21/+71
*	[SPARK-2014] Make PySpark store RDDs in MEMORY_ONLY_SER with compression by d...	Prashant Sharma	2014-07-24	1	-2/+2
*	[SPARK-2494] [PySpark] make hash of None consistant cross machines	Davies Liu	2014-07-21	1	-3/+32
*	Made rdd.py pep8 complaint by using Autopep8 and a little manual editing.	Prashant Sharma	2014-07-14	1	-58/+92
*	[SPARK-2061] Made splits deprecated in JavaRDDLike	Anant	2014-06-20	1	-2/+2
*	SPARK-1868: Users should be allowed to cogroup at least 4 RDDs	Allan Douglas R. de Oliveira	2014-06-20	1	-7/+15
*	SPARK-2203: PySpark defaults to use same num reduce partitions as map side	Aaron Davidson	2014-06-20	1	-3/+18
*	SPARK-2146. Fix takeOrdered doc	Sandy Ryza	2014-06-17	1	-1/+1
*	SPARK-1063 Add .sortBy(f) method on RDD	Andrew Ash	2014-06-17	1	-0/+12
*	[SPARK-2130] End-user friendly String repr for StorageLevel in Python	Kan Zhang	2014-06-16	1	-0/+3
*	SPARK-1939 Refactor takeSample method in RDD to use ScaSRS	Doris Xin	2014-06-12	1	-61/+106
*	SPARK-554. Add aggregateByKey.	Sandy Ryza	2014-06-12	1	-1/+18
*	fixed typo in docstring for min()	Jeff Thompson	2014-06-12	1	-1/+1
*	[SPARK-1308] Add getNumPartitions to pyspark RDD	Syed Hashmi	2014-06-09	1	-18/+27
*	[SPARK-1161] Add saveAsPickleFile and SparkContext.pickleFile in Python	Kan Zhang	2014-06-03	1	-8/+25
*	[SPARK-1468] Modify the partition function used by partitionBy.	Erik Selin	2014-06-03	1	-1/+4
*	SPARK-1839: PySpark RDD#take() shouldn't always read from driver	Aaron Davidson	2014-05-31	1	-21/+38
*	Documentation: Encourage use of reduceByKey instead of groupByKey.	Patrick Wendell	2014-05-14	1	-0/+4
*	[SPARK-1690] Tolerating empty elements when saving Python RDD to text files	Kan Zhang	2014-05-10	1	-0/+8
*	[SPARK-1674] fix interrupted system call error in pyspark's RDD.pipe	Xiangrui Meng	2014-04-29	1	-3/+3
*	SPARK-1242 Add aggregate to python rdd	Holden Karau	2014-04-24	1	-2/+29
*	SPARK-1438 RDD.sample() make seed param optional	Arun Ramakrishnan	2014-04-24	1	-7/+6
*	Spark 1271: Co-Group and Group-By should pass Iterable[X]	Holden Karau	2014-04-08	1	-5/+5
*	SPARK-1305: Support persisting RDD's directly to Tachyon	Haoyuan Li	2014-04-04	1	-1/+2
*	Spark 1162 Implemented takeOrdered in pyspark.	Prashant Sharma	2014-04-03	1	-5/+102
*	SPARK-1322, top in pyspark should sort result in descending order.	Prashant Sharma	2014-03-26	1	-3/+3
*	Added doctest for map function in rdd.py	Jyotiska NK	2014-03-19	1	-0/+4