spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	Additional tests for MapOutputTracker.	Charles Reiss	2013-01-14	1	-2/+80
\|
*	Throw FetchFailedException for cached missing locs	Charles Reiss	2013-01-14	1	-10/+26
\|
*	Merge branch 'master' of github.com:mesos/spark	Matei Zaharia	2013-01-13	1	-4/+12
\|\
\| *	Merge pull request #360 from rxin/cogroup-java	Matei Zaharia	2013-01-13	1	-4/+12
\| \|\ \| \| \| \| \| \|	Changed CoGroupRDD's hash map from Scala to Java.
\| \| *	Removed the use of getOrElse to avoid Scala wrapper for every call.	Reynold Xin	2013-01-13	1	-3/+10
\| \| \|
\| \| *	Changed CoGroupRDD's hash map from Scala to Java.	Reynold Xin	2013-01-10	1	-2/+3
\| \| \|
* \| \|	Make filter preserve partitioner info, since it can	Matei Zaharia	2013-01-13	2	-1/+7
\|/ /
* \|	Merge pull request #368 from mbautin/add_spray_json_dependency	Matei Zaharia	2013-01-13	2	-0/+10
\|\ \ \| \| \| \| \| \|	Add missing dependency spray-json to Maven build
\| * \|	Add missing dependency spray-json to Maven build	Mikhail Bautin	2013-01-13	2	-0/+10
\|/ /
* \|	Merge pull request #346 from JoshRosen/python-api	Matei Zaharia	2013-01-12	35	-12/+2985
\|\ \ \| \| \| \| \| \|	Python API (PySpark)
\| * \|	Change PYSPARK_PYTHON_EXEC to PYSPARK_PYTHON.	Josh Rosen	2013-01-10	1	-1/+1
\| \| \|
\| * \|	Use take() instead of takeSample() in PySpark kmeans example.	Josh Rosen	2013-01-09	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \|	This is a temporary change until we port takeSample().
\| * \|	Indicate success/failure in PySpark test script.	Josh Rosen	2013-01-09	1	-0/+17
\| \| \|
\| * \|	Add mapPartitionsWithSplit() to PySpark.	Josh Rosen	2013-01-08	4	-13/+30
\| \| \|
\| * \|	Change PySpark RDD.take() to not call iterator().	Josh Rosen	2013-01-03	3	-6/+10
\| \| \|
\| * \|	Add `pyspark` script to replace the other scripts.	Josh Rosen	2013-01-01	6	-36/+69
\| \| \| \| \| \| \| \| \|	Expand the PySpark programming guide.
\| * \|	Rename top-level 'pyspark' directory to 'python'	Josh Rosen	2013-01-01	28	-13/+13
\| \| \|
\| * \|	Minor documentation and style fixes for PySpark.	Josh Rosen	2013-01-01	10	-32/+70
\| \| \|
\| * \|	Launch with `scala` by default in run-pyspark	Josh Rosen	2012-12-31	1	-0/+5
\| \| \|
\| * \|	Port LR example to PySpark using numpy.	Josh Rosen	2012-12-29	1	-0/+57
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This version of the example crashes after the first iteration with "OverflowError: math range error" because Python's math.exp() behaves differently than Scala's; see SPARK-646.
\| * \|	Add test for pyspark.RDD.saveAsTextFile().	Josh Rosen	2012-12-29	1	-1/+8
\| \| \|
\| * \|	Update PySpark for compatibility with TaskContext.	Josh Rosen	2012-12-29	2	-9/+7
\| \| \|
\| * \|	Merge remote-tracking branch 'origin/master' into python-api	Josh Rosen	2012-12-29	124	-1677/+4394
\| \|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: docs/quick-start.md
\| * \| \|	Use batching in pyspark parallelize(); fix cartesian()	Josh Rosen	2012-12-29	3	-27/+31
\| \| \| \|
\| * \| \|	Fix bug in pyspark.serializers.batch; add .gitignore.	Josh Rosen	2012-12-29	3	-2/+6
\| \| \| \|
\| * \| \|	Add documentation for Python API.	Josh Rosen	2012-12-28	12	-48/+127
\| \| \| \|
\| * \| \|	Fix bug (introduced by batching) in PySpark take()	Josh Rosen	2012-12-28	4	-15/+22
\| \| \| \|
\| * \| \|	Mark api.python classes as private; echo Java output to stderr.	Josh Rosen	2012-12-28	3	-31/+24
\| \| \| \|
\| * \| \|	Simplify PySpark installation.	Josh Rosen	2012-12-27	13	-47/+78
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Bundle Py4J binaries, since it's hard to install - Uses Spark's `run` script to launch the Py4J gateway, inheriting the settings in spark-env.sh With these changes, (hopefully) nothing more than running `sbt/sbt package` will be necessary to run PySpark.
\| * \| \|	Use addFile() to ship code to cluster in PySpark.	Josh Rosen	2012-12-27	2	-10/+74
\| \| \| \| \| \| \| \| \| \| \| \|	Add options to pyspark.SparkContext constructor.
\| * \| \|	Add epydoc API documentation for PySpark.	Josh Rosen	2012-12-27	6	-19/+254
\| \| \| \|
\| * \| \|	Add IPython support to pyspark-shell.	Josh Rosen	2012-12-27	3	-8/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Suggested by / based on code from @MLnick
\| * \| \|	Remove debug output from PythonPartitioner.	Josh Rosen	2012-12-26	1	-2/+0
\| \| \| \|
\| * \| \|	Add support for batched serialization of Python objects in PySpark.	Josh Rosen	2012-12-26	3	-20/+74
\| \| \| \|
\| * \| \|	Use filesystem to collect RDDs in PySpark.	Josh Rosen	2012-12-24	5	-63/+66
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Passing large volumes of data through Py4J seems to be slow. It appears to be faster to write the data to the local filesystem and read it back from Python.
\| * \| \|	Reduce object overhead in Pyspark shuffle and collect	Josh Rosen	2012-12-24	1	-5/+14
\| \| \| \|
\| * \| \|	Fix PySpark hash partitioning bug.	Josh Rosen	2012-10-28	3	-9/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A Java array's hashCode is based on its object identify, not its elements, so this was causing serialized keys to be hashed incorrectly. This commit adds a PySpark-specific workaround and adds more tests.
\| * \| \|	Bump required Py4J version and add test for large broadcast variables.	Josh Rosen	2012-10-28	3	-2/+4
\| \| \| \|
\| * \| \|	Remove PYTHONPATH from SparkContext's executorEnvs.	Josh Rosen	2012-10-22	3	-11/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It makes more sense to pass it in the dictionary of environment variables that is used to construct PythonRDD.
\| * \| \|	Add PySpark README and run scripts.	Josh Rosen	2012-10-20	7	-4/+125
\| \| \| \|
\| * \| \|	Update Python API for v0.6.0 compatibility.	Josh Rosen	2012-10-19	7	-27/+42
\| \| \| \|
\| * \| \|	Merge tag 'v0.6.0' into python-api	Josh Rosen	2012-10-19	264	-3914/+17506
\| \|\ \ \
\| * \| \| \|	Fix Python 2.6 compatibility in Python API.	Josh Rosen	2012-09-17	2	-28/+11
\| \| \| \| \|
\| * \| \| \|	Fix minor bugs in Python API examples.	Josh Rosen	2012-08-27	2	-5/+5
\| \| \| \| \|
\| * \| \| \|	Add pipe(), saveAsTextFile(), sc.union() to Python API.	Josh Rosen	2012-08-27	3	-10/+37
\| \| \| \| \|
\| * \| \| \|	Simplify Python worker; pipeline the map step of partitionBy().	Josh Rosen	2012-08-27	5	-127/+59
\| \| \| \| \|
\| * \| \| \|	Use local combiners in Python API combineByKey().	Josh Rosen	2012-08-27	2	-25/+24
\| \| \| \| \|
\| * \| \| \|	Add countByKey(), reduceByKeyLocally() to Python API	Josh Rosen	2012-08-27	1	-13/+39
\| \| \| \| \|
\| * \| \| \|	Add mapPartitions(), glom(), countByValue() to Python API.	Josh Rosen	2012-08-27	1	-4/+28
\| \| \| \| \|
\| * \| \| \|	Add broadcast variables to Python API.	Josh Rosen	2012-08-27	5	-29/+110
\| \| \| \| \|