spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge branch 'master' of git://github.com/mesos/spark into scala-2.10	Prashant Sharma	2013-09-15	5	-1/+78
\|\ \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/SparkContext.scala project/SparkBuild.scala
\| *	Whoopsy daisy	Aaron Davidson	2013-09-08	1	-1/+0
\| \|
\| *	Export StorageLevel and refactor	Aaron Davidson	2013-09-07	5	-26/+62
\| \|
\| *	Remove reflection, hard-code StorageLevels	Aaron Davidson	2013-09-07	2	-24/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The sc.StorageLevel -> StorageLevel pathway is a bit janky, but otherwise the shell would have to call a private method of SparkContext. Having StorageLevel available in sc also doesn't seem like the end of the world. There may be a better solution, though. As for creating the StorageLevel object itself, this seems to be the best way in Python 2 for creating singleton, enum-like objects: http://stackoverflow.com/questions/36932/how-can-i-represent-an-enum-in-python
\| *	Memoize StorageLevels read from JVM	Aaron Davidson	2013-09-06	1	-2/+9
\| \|
\| *	SPARK-660: Add StorageLevel support in Python	Aaron Davidson	2013-09-05	3	-1/+34
\| \| \| \| \| \| \| \| \| \|	It uses reflection... I am not proud of that fact, but it at least ensures compatibility (sans refactoring of the StorageLevel stuff).
* \|	Merged with master	Prashant Sharma	2013-09-06	25	-98/+948
\|\\|
\| *	Add missing license headers found with RAT	Matei Zaharia	2013-09-02	1	-1/+18
\| \|
\| *	Exclude some private modules in epydoc	Matei Zaharia	2013-09-02	1	-0/+1
\| \|
\| *	Further fixes to get PySpark to work on Windows	Matei Zaharia	2013-09-02	1	-5/+12
\| \|
\| *	Allow PySpark to launch worker.py directly on Windows	Matei Zaharia	2013-09-01	1	-4/+7
\| \|
\| *	Move some classes to more appropriate packages:	Matei Zaharia	2013-09-01	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	* RDD, RDDFunctions -> org.apache.spark.rdd Utils, ClosureCleaner, SizeEstimator -> org.apache.spark.util * JavaSerializer, KryoSerializer -> org.apache.spark.serializer
\| *	Add banner to PySpark and make wordcount output nicer	Matei Zaharia	2013-09-01	2	-1/+14
\| \|
\| *	Initial work to rename package to org.apache.spark	Matei Zaharia	2013-09-01	3	-5/+5
\| \|
\| *	Merge pull request #861 from AndreSchumacher/pyspark_sampling_function	Matei Zaharia	2013-08-31	2	-7/+167
\| \|\ \| \| \| \| \| \|	Pyspark sampling function
\| \| *	RDD sample() and takeSample() prototypes for PySpark	Andre Schumacher	2013-08-28	2	-7/+167
\| \| \|
\| * \|	Merge pull request #870 from JoshRosen/spark-885	Matei Zaharia	2013-08-31	1	-1/+5
\| \|\ \ \| \| \| \| \| \| \| \|	Don't send SIGINT / ctrl-c to Py4J gateway subprocess
\| \| * \|	Don't send SIGINT to Py4J gateway subprocess.	Josh Rosen	2013-08-28	1	-1/+5
\| \| \|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This addresses SPARK-885, a usability issue where PySpark's Java gateway process would be killed if the user hit ctrl-c. Note that SIGINT still won't cancel the running s This fix is based on http://stackoverflow.com/questions/5045771
\| * \|	Merge pull request #869 from AndreSchumacher/subtract	Matei Zaharia	2013-08-30	1	-0/+37
\| \|\ \ \| \| \| \| \| \| \| \|	PySpark: implementing subtractByKey(), subtract() and keyBy()
\| \| * \|	PySpark: implementing subtractByKey(), subtract() and keyBy()	Andre Schumacher	2013-08-28	1	-0/+37
\| \| \|/
\| * \|	Fix PySpark for assembly run and include it in dist	Matei Zaharia	2013-08-29	1	-0/+0
\| \| \|
\| * \|	Change build and run instructions to use assemblies	Matei Zaharia	2013-08-29	1	-1/+1
\| \|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit makes Spark invocation saner by using an assembly JAR to find all of Spark's dependencies instead of adding all the JARs in lib_managed. It also packages the examples into an assembly and uses that as SPARK_EXAMPLES_JAR. Finally, it replaces the old "run" script with two better-named scripts: "run-examples" for examples, and "spark-class" for Spark internal classes (e.g. REPL, master, etc). This is also designed to minimize the confusion people have in trying to use "run" to run their own classes; it's not meant to do that, but now at least if they look at it, they can modify run-examples to do a decent job for them. As part of this, Bagel's examples are also now properly moved to the examples package instead of bagel.
\| *	Implementing SPARK-838: Add DoubleRDDFunctions methods to PySpark	Andre Schumacher	2013-08-21	2	-1/+168
\| \|
\| *	Implementing SPARK-878 for PySpark: adding zip and egg files to context and ↵	Andre Schumacher	2013-08-16	5	-5/+37
\| \| \| \| \| \| \| \|	passing it down to workers which add these to their sys.path
\| *	Fix PySpark unit tests on Python 2.6.	Josh Rosen	2013-08-14	2	-19/+20
\| \|
\| *	Merge pull request #802 from stayhf/SPARK-760-Python	Matei Zaharia	2013-08-12	1	-0/+70
\| \|\ \| \| \| \| \| \|	Simple PageRank algorithm implementation in Python for SPARK-760
\| \| *	Code update for Matei's suggestions	stayhf	2013-08-11	1	-7/+9
\| \| \|
\| \| *	Simple PageRank algorithm implementation in Python for SPARK-760	stayhf	2013-08-10	1	-0/+68
\| \| \|
\| * \|	Merge pull request #813 from AndreSchumacher/add_files_pyspark	Matei Zaharia	2013-08-12	1	-1/+6
\| \|\ \ \| \| \| \| \| \| \| \|	Implementing SPARK-865: Add the equivalent of ADD_JARS to PySpark
\| \| * \|	Implementing SPARK-865: Add the equivalent of ADD_JARS to PySpark	Andre Schumacher	2013-08-12	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now ADD_FILES uses a comma as file name separator.
\| * \| \|	Merge pull request #747 from mateiz/improved-lr	Matei Zaharia	2013-08-06	1	-27/+26
\| \|\ \ \ \| \| \| \| \| \| \| \| \| \|	Update the Python logistic regression example
\| \| * \| \|	Fix string parsing and style in LR	Matei Zaharia	2013-07-31	1	-1/+1
\| \| \| \| \|
\| \| * \| \|	Update the Python logistic regression example to read from a file and	Matei Zaharia	2013-07-29	1	-27/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	batch input records for more efficient NumPy computations
\| * \| \| \|	Do not inherit master's PYTHONPATH on workers.	Josh Rosen	2013-07-29	1	-3/+2
\| \|/ / / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes SPARK-832, an issue where PySpark would not work when the master and workers used different SPARK_HOME paths. This change may potentially break code that relied on the master's PYTHONPATH being used on workers. To have custom PYTHONPATH additions used on the workers, users should set a custom PYTHONPATH in spark-env.sh rather than setting it in the shell.
\| * \| \|	Merge branch 'master' of github.com:mesos/spark	Matei Zaharia	2013-07-29	6	-15/+9
\| \|\ \ \
\| \| * \| \|	Some fixes to Python examples (style and package name for LR)	Matei Zaharia	2013-07-27	6	-15/+9
\| \| \| \|/ \| \| \|/\|
\| * \| \|	SPARK-815. Python parallelize() should split lists before batching	Matei Zaharia	2013-07-29	1	-2/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	One unfortunate consequence of this fix is that we materialize any collections that are given to us as generators, but this seems necessary to get reasonable behavior on small collections. We could add a batchSize parameter later to bypass auto-computation of batch size if this becomes a problem (e.g. if users really want to parallelize big generators nicely)
\| * \| \|	Use None instead of empty string as it's slightly smaller/faster	Matei Zaharia	2013-07-29	1	-1/+1
\| \| \| \|
\| * \| \|	Allow python/run-tests to run from any directory	Matei Zaharia	2013-07-29	1	-0/+3
\| \| \| \|
\| * \| \|	Optimize Python foreach() to not return as many objects	Matei Zaharia	2013-07-29	1	-1/+5
\| \| \| \|
\| * \| \|	Optimize Python take() to not compute entire first partition	Matei Zaharia	2013-07-29	1	-6/+9
\| \|/ /
\| * \|	Add Apache license headers and LICENSE and NOTICE files	Matei Zaharia	2013-07-16	19	-1/+325
\| \| \|
* \| \|	PySpark: replacing class manifest by class tag for Scala 2.10.2 inside rdd.py	Andre Schumacher	2013-08-30	1	-2/+2
\|/ /
* \|	Fixed PySpark perf regression by not using socket.makefile(), and improved	root	2013-07-01	1	-18/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	debuggability by letting "print" statements show up in the executor's stderr Conflicts: core/src/main/scala/spark/api/python/PythonRDD.scala
* \|	Fix reporting of PySpark exceptions	Jey Kottalam	2013-06-21	2	-5/+19
\| \|
* \|	PySpark daemon: fix deadlock, improve error handling	Jey Kottalam	2013-06-21	1	-17/+50
\| \|
* \|	Add tests and fixes for Python daemon shutdown	Jey Kottalam	2013-06-21	3	-22/+69
\| \|
* \|	Prefork Python worker processes	Jey Kottalam	2013-06-21	2	-32/+138
\| \|
* \|	Add Python timing instrumentation	Jey Kottalam	2013-06-21	2	-1/+19
\| \|
* \|	Fix Python saveAsTextFile doctest to not expect order to be preserved	Jey Kottalam	2013-04-02	1	-1/+1
\| \|