spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge branch 'scripts-reorg' of github.com:shane-huang/incubator-spark into ↵	Prashant Sharma	2014-01-02	2	-2/+2
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	spark-915-segregate-scripts Conflicts: bin/spark-shell core/pom.xml core/src/main/scala/org/apache/spark/SparkContext.scala core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala core/src/main/scala/org/apache/spark/ui/UIWorkloadGenerator.scala core/src/test/scala/org/apache/spark/DriverSuite.scala python/run-tests sbin/compute-classpath.sh sbin/spark-class sbin/stop-slaves.sh
\| *	Merge branch 'reorgscripts' into scripts-reorg	shane-huang	2013-09-27	2	-2/+2
\| \|\
\| \| *	fix paths and change spark to use APP_MEM as application driver memory ↵	shane-huang	2013-09-26	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	instead of SPARK_MEM, user should add application jars to SPARK_CLASSPATH Signed-off-by: shane-huang <shengsheng.huang@intel.com>
\| \| *	add scripts in bin	shane-huang	2013-09-23	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: shane-huang <shengsheng.huang@intel.com>
\| \| *	added spark-class and spark-executor to sbin	shane-huang	2013-09-23	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: shane-huang <shengsheng.huang@intel.com>
* \| \|	Fix Python code after change of getOrElse	Matei Zaharia	2014-01-01	2	-7/+14
\| \| \|
* \| \|	Miscellaneous fixes from code review.	Matei Zaharia	2014-01-01	1	-8/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Also replaced SparkConf.getOrElse with just a "get" that takes a default value, and added getInt, getLong, etc to make code that uses this simpler later on.
* \| \|	Merge remote-tracking branch 'apache/master' into conf2	Matei Zaharia	2013-12-31	2	-9/+4
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/rdd/CheckpointRDD.scala streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala
\| * \ \	Merge pull request #289 from tdas/filestream-fix	Patrick Wendell	2013-12-31	2	-9/+4
\| \|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Bug fixes for file input stream and checkpointing - Fixed bugs in the file input stream that led the stream to fail due to transient HDFS errors (listing files when a background thread it deleting fails caused errors, etc.) - Updated Spark's CheckpointRDD and Streaming's CheckpointWriter to use SparkContext.hadoopConfiguration, to allow checkpoints to be written to any HDFS compatible store requiring special configuration. - Changed the API of SparkContext.setCheckpointDir() - eliminated the unnecessary 'useExisting' parameter. Now SparkContext will always create a unique subdirectory within the user specified checkpoint directory. This is to ensure that previous checkpoint files are not accidentally overwritten. - Fixed bug where setting checkpoint directory as a relative local path caused the checkpointing to fail.
\| \| * \| \|	Fixed Python API for sc.setCheckpointDir. Also other fixes based on ↵	Tathagata Das	2013-12-24	2	-9/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reynold's comments on PR 289.
* \| \| \| \|	Updated docs for SparkConf and handled review comments	Matei Zaharia	2013-12-30	2	-17/+31
\| \| \| \| \|
* \| \| \| \|	Properly show Spark properties on web UI, and change app name property	Matei Zaharia	2013-12-29	2	-3/+3
\| \| \| \| \|
* \| \| \| \|	Fix some Python docs and make sure to unset SPARK_TESTING in Python	Matei Zaharia	2013-12-29	6	-22/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	tests so we don't get the test spark.conf on the classpath.
* \| \| \| \|	Merge remote-tracking branch 'origin/master' into conf2	Matei Zaharia	2013-12-29	9	-2/+599
\|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/SparkContext.scala core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala core/src/main/scala/org/apache/spark/scheduler/local/LocalScheduler.scala core/src/main/scala/org/apache/spark/util/MetadataCleaner.scala core/src/test/scala/org/apache/spark/scheduler/TaskResultGetterSuite.scala core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala new-yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala streaming/src/test/scala/org/apache/spark/streaming/BasicOperationsSuite.scala streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala streaming/src/test/scala/org/apache/spark/streaming/TestSuiteBase.scala streaming/src/test/scala/org/apache/spark/streaming/WindowOperationsSuite.scala
\| * \| \| \|	Merge pull request #283 from tmyklebu/master	Matei Zaharia	2013-12-26	8	-1/+598
\| \|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Python bindings for mllib This pull request contains Python bindings for the regression, clustering, classification, and recommendation tools in mllib. For each 'train' frontend exposed, there is a Scala stub in PythonMLLibAPI.scala and a Python stub in mllib.py. The Python stub serialises the input RDD and any vector/matrix arguments into a mutually-understood format and calls the Scala stub. The Scala stub deserialises the RDD and the vector/matrix arguments, calls the appropriate 'train' function, serialises the resulting model, and returns the serialised model. ALSModel is slightly different since a MatrixFactorizationModel has RDDs inside. The Scala stub returns a handle to a Scala MatrixFactorizationModel; prediction is done by calling the Scala predict method. I have tested these bindings on an x86_64 machine running Linux. There is a risk that these bindings may fail on some choose-your-own-endian platform if Python's endian differs from java.nio.ByteBuffer's idea of the native byte order.
\| \| * \| \| \|	Remove commented code in __init__.py.	Tor Myklebust	2013-12-25	1	-8/+0
\| \| \| \| \| \|
\| \| * \| \| \|	Fix copypasta in __init__.py. Don't import anything directly into ↵	Tor Myklebust	2013-12-25	1	-26/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	pyspark.mllib.
\| \| * \| \| \|	Initial weights in Scala are ones; do that too. Also fix some errors.	Tor Myklebust	2013-12-25	1	-6/+6
\| \| \| \| \| \|
\| \| * \| \| \|	Split the mllib bindings into a whole bunch of modules and rename some things.	Tor Myklebust	2013-12-25	7	-183/+409
\| \| \| \| \| \|
\| \| * \| \| \|	Remove useless line from test stub.	Tor Myklebust	2013-12-24	1	-1/+0
\| \| \| \| \| \|
\| \| * \| \| \|	Python change for move of PythonMLLibAPI.	Tor Myklebust	2013-12-24	1	-1/+1
\| \| \| \| \| \|
\| \| * \| \| \|	Release JVM reference to the ALSModel when done.	Tor Myklebust	2013-12-22	1	-2/+2
\| \| \| \| \| \|
\| \| * \| \| \|	Python stubs for ALSModel.	Tor Myklebust	2013-12-21	2	-8/+56
\| \| \| \| \| \|
\| \| * \| \| \|	Un-semicolon mllib.py.	Tor Myklebust	2013-12-20	1	-11/+11
\| \| \| \| \| \|
\| \| * \| \| \|	Change some docstrings and add some others.	Tor Myklebust	2013-12-20	1	-1/+3
\| \| \| \| \| \|
\| \| * \| \| \|	Licence notice.	Tor Myklebust	2013-12-20	1	-0/+17
\| \| \| \| \| \|
\| \| * \| \| \|	Whitespace.	Tor Myklebust	2013-12-20	1	-1/+1
\| \| \| \| \| \|
\| \| * \| \| \|	Remove gigantic endian-specific test and exception tests.	Tor Myklebust	2013-12-20	1	-38/+3
\| \| \| \| \| \|
\| \| * \| \| \|	Tests for the Python side of the mllib bindings.	Tor Myklebust	2013-12-20	1	-52/+172
\| \| \| \| \| \|
\| \| * \| \| \|	Python stubs for classification and clustering.	Tor Myklebust	2013-12-20	2	-16/+96
\| \| \| \| \| \|
\| \| * \| \| \|	Python side of python bindings for linear, Lasso, and ridge regression	Tor Myklebust	2013-12-19	2	-15/+72
\| \| \| \| \| \|
\| \| * \| \| \|	Incorporate most of Josh's style suggestions. I don't want to deal with the ↵	Tor Myklebust	2013-12-19	2	-98/+91
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	type and length checking errors until we've got at least one working stub that we're all happy with.
\| \| * \| \| \|	The rest of the Python side of those bindings.	Tor Myklebust	2013-12-19	3	-2/+4
\| \| \| \| \| \|
\| \| * \| \| \|	First cut at python mllib bindings. Only LinearRegression is supported.	Tor Myklebust	2013-12-19	1	-0/+114
\| \| \| \| \| \|
\| * \| \| \| \|	Typo: avaiable -> available	Andrew Ash	2013-12-24	1	-1/+1
\| \| \|/ / / \| \|/\| \| \|
* \| \| \| \|	Add Python docs about SparkConf	Matei Zaharia	2013-12-29	2	-1/+44
\| \| \| \| \|
* \| \| \| \|	Fix some other Python tests due to initializing JVM in a different way	Matei Zaharia	2013-12-29	3	-10/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The test in context.py created two different instances of the SparkContext class by copying "globals", so that some tests can have a global "sc" object and others can try initializing their own contexts. This led to two JVM gateways being created since SparkConf also looked at pyspark.context.SparkContext to get the JVM.
* \| \| \| \|	Add SparkConf support in Python	Matei Zaharia	2013-12-29	4	-13/+146
\| \| \| \| \|
* \| \| \| \|	Fix Python use of getLocalDir	Matei Zaharia	2013-12-29	1	-1/+1
\|/ / / /
* \| \| \|	Merge pull request #276 from shivaram/collectPartition	Reynold Xin	2013-12-19	2	-4/+6
\|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add collectPartition to JavaRDD interface. This interface is useful for implementing `take` from other language frontends where the data is serialized. Also remove `takePartition` from PythonRDD and use `collectPartition` in rdd.py. Thanks @concretevitamin for the original change and tests.
\| * \| \| \|	Make collectPartitions take an array of partitions	Shivaram Venkataraman	2013-12-19	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change the implementation to use runJob instead of PartitionPruningRDD. Also update the unit tests and the python take implementation to use the new interface.
\| * \| \| \|	Add collectPartition to JavaRDD interface.	Shivaram Venkataraman	2013-12-18	2	-4/+1
\| \|/ / / \| \| \| \| \| \| \| \| \| \| \| \|	Also remove takePartition from PythonRDD and use collectPartition in rdd.py.
* / / /	Add toString to Java RDD, and __repr__ to Python RDD	Nick Pentreath	2013-12-19	1	-0/+3
\|/ / /
* \| \|	Merge branch 'master' into akka-bug-fix	Prashant Sharma	2013-12-11	3	-1/+36
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/pom.xml core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala pom.xml project/SparkBuild.scala streaming/pom.xml yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala
\| * \| \|	License headers	Patrick Wendell	2013-12-09	1	-0/+17
\| \| \| \|
\| * \| \|	Fix UnicodeEncodeError in PySpark saveAsTextFile().	Josh Rosen	2013-11-28	2	-1/+19
\| \| \| \| \| \| \| \| \| \| \| \|	Fixes SPARK-970.
* \| \| \|	Merge branch 'master' into wip-scala-2.10	Prashant Sharma	2013-11-27	8	-142/+383
\|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala core/src/main/scala/org/apache/spark/rdd/MapPartitionsRDD.scala core/src/main/scala/org/apache/spark/rdd/MapPartitionsWithContextRDD.scala core/src/main/scala/org/apache/spark/rdd/RDD.scala python/pyspark/rdd.py
\| * \| \|	Removed unused basestring case from dump_stream.	Josh Rosen	2013-11-26	1	-2/+0
\| \| \| \|
\| * \| \|	FramedSerializer: _dumps => dumps, _loads => loads.	Josh Rosen	2013-11-10	4	-18/+18
\| \| \| \|
\| * \| \|	Send PySpark commands as bytes insetad of strings.	Josh Rosen	2013-11-10	3	-16/+13
\| \| \| \|