spark - Mirror of Apache Spark

	Commit message (Expand)	Author	Age	Files	Lines
*	[SPARK-2079] Support batching when serializing SchemaRDD to Python	Kan Zhang	2014-06-14	1	-1/+3
*	SPARK-1939 Refactor takeSample method in RDD to use ScaSRS	Doris Xin	2014-06-12	1	-61/+106
*	SPARK-554. Add aggregateByKey.	Sandy Ryza	2014-06-12	2	-1/+33
*	fixed typo in docstring for min()	Jeff Thompson	2014-06-12	1	-1/+1
*	HOTFIX: PySpark tests should be order insensitive.	Patrick Wendell	2014-06-11	1	-4/+4
*	HOTFIX: A few PySpark tests were not actually run	Andrew Or	2014-06-11	1	-1/+4
*	[SPARK-2091][MLLIB] use numpy.dot instead of ndarray.dot	Xiangrui Meng	2014-06-11	1	-3/+5
*	HOTFIX: Fix Python tests on Jenkins.	Patrick Wendell	2014-06-10	1	-4/+7
*	SPARK-1416: PySpark support for SequenceFile and Hadoop InputFormats	Nick Pentreath	2014-06-09	2	-0/+282
*	[SPARK-1308] Add getNumPartitions to pyspark RDD	Syed Hashmi	2014-06-09	1	-18/+27
*	[SPARK-1752][MLLIB] Standardize text format for vectors and labeled points	Xiangrui Meng	2014-06-04	4	-51/+129
*	[SPARK-1161] Add saveAsPickleFile and SparkContext.pickleFile in Python	Kan Zhang	2014-06-03	2	-8/+39
*	[SPARK-1468] Modify the partition function used by partitionBy.	Erik Selin	2014-06-03	1	-1/+4
*	[SPARK-1942] Stop clearing spark.driver.port in unit tests	Syed Hashmi	2014-06-03	1	-4/+0
*	SPARK-1917: fix PySpark import of scipy.special functions	Uri Laserson	2014-05-31	2	-1/+25
*	SPARK-1839: PySpark RDD#take() shouldn't always read from driver	Aaron Davidson	2014-05-31	2	-21/+64
*	Added doctest and method description in context.py	Jyotiska NK	2014-05-28	1	-1/+14
*	Fix PEP8 violations in Python mllib.	Reynold Xin	2014-05-25	8	-88/+78
*	Python docstring update for sql.py.	Reynold Xin	2014-05-25	1	-61/+63
*	SPARK-1822: Some minor cleanup work on SchemaRDD.count()	Reynold Xin	2014-05-25	1	-1/+4
*	[SPARK-1822] SchemaRDD.count() should use query optimizer	Kan Zhang	2014-05-25	1	-1/+13
*	[SPARK-1900 / 1918] PySpark on YARN is broken	Andrew Or	2014-05-24	1	-2/+6
*	[SPARK-1519] Support minPartitions param of wholeTextFiles() in PySpark	Kan Zhang	2014-05-21	1	-2/+10
*	[SPARK-1808] Route bin/pyspark through Spark submit	Andrew Or	2014-05-16	2	-5/+7
*	Documentation: Encourage use of reduceByKey instead of groupByKey.	Patrick Wendell	2014-05-14	1	-0/+4
*	[FIX] do not load defaults when testing SparkConf in pyspark	Xiangrui Meng	2014-05-14	1	-1/+1
*	[SQL] Make it possible to create Java/Python SQLContexts from an existing Sca...	Michael Armbrust	2014-05-13	1	-2/+5
*	[SPARK-1690] Tolerating empty elements when saving Python RDD to text files	Kan Zhang	2014-05-10	1	-0/+8
*	Add Python includes to path before depickling broadcast values	Bouke van der Bijl	2014-05-10	1	-7/+7
*	[SPARK-1743][MLLIB] add loadLibSVMFile and saveAsLibSVMFile to pyspark	Xiangrui Meng	2014-05-07	2	-2/+178
*	SPARK-1579: Clean up PythonRDD and avoid swallowing IOExceptions	Aaron Davidson	2014-05-07	2	-2/+14
*	[SPARK-1460] Returning SchemaRDD instead of normal RDD on Set operations...	Kan Zhang	2014-05-07	1	-0/+29
*	SPARK-1637: Clean up examples for 1.0	Sandeep	2014-05-06	10	-574/+0
*	[SPARK-1549] Add Python support to spark-submit	Matei Zaharia	2014-05-06	3	-58/+168
*	[SPARK-1594][MLLIB] Cleaning up MLlib APIs and guide	Xiangrui Meng	2014-05-05	1	-2/+2
*	SPARK-1004. PySpark on YARN	Sandy Ryza	2014-04-29	5	-11/+33
*	[SPARK-1674] fix interrupted system call error in pyspark's RDD.pipe	Xiangrui Meng	2014-04-29	1	-3/+3
*	Minor fix to python table caching API.	Michael Armbrust	2014-04-29	1	-2/+2
*	SPARK-1242 Add aggregate to python rdd	Holden Karau	2014-04-24	1	-2/+29
*	[SPARK-986]: Job cancelation for PySpark	Ahir Reddy	2014-04-24	1	-3/+49
*	SPARK-1438 RDD.sample() make seed param optional	Arun Ramakrishnan	2014-04-24	2	-24/+20
*	fix bugs of dot in python	Xusen Yin	2014-04-22	2	-5/+5
*	[SPARK-1439, SPARK-1440] Generate unified Scaladoc across projects and Javadocs	Matei Zaharia	2014-04-21	1	-2/+2
*	Add insertInto and saveAsTable to Python API.	Michael Armbrust	2014-04-19	1	-0/+13
*	Fixed broken pyspark shell.	Reynold Xin	2014-04-18	1	-2/+2
*	SPARK-1483: Rename minSplits to minPartitions in public APIs	CodingCat	2014-04-18	1	-3/+3
*	FIX: Don't build Hive in assembly unless running Hive tests.	Patrick Wendell	2014-04-17	1	-1/+3
*	[python alternative] pyspark require Python2, failing if system default is Py...	AbhishekKr	2014-04-16	1	-6/+14
*	[SQL] SPARK-1424 Generalize insertIntoTable functions on SchemaRDDs	Michael Armbrust	2014-04-15	1	-4/+10
*	[WIP] SPARK-1430: Support sparse data in Python MLlib	Matei Zaharia	2014-04-15	12	-139/+1178