spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge branch 'master' into MatrixFactorizationModel-fix	Hossein Falaki	2014-01-07	3	-3/+3
\|\
\| *	Merge remote-tracking branch 'apache-github/master' into remove-binaries	Patrick Wendell	2014-01-03	2	-2/+2
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/test/scala/org/apache/spark/DriverSuite.scala docs/python-programming-guide.md
\| \| *	Merge pull request #317 from ScrapCodes/spark-915-segregate-scripts	Patrick Wendell	2014-01-03	2	-2/+2
\| \| \|\ \| \| \| \| \| \| \| \| \| \| \| \|	Spark-915 segregate scripts
\| \| \| *	sbin/spark-class* -> bin/spark-class*	Prashant Sharma	2014-01-03	1	-1/+1
\| \| \| \|
\| \| \| *	pyspark -> bin/pyspark	Prashant Sharma	2014-01-02	1	-1/+1
\| \| \| \|
\| \| \| *	Merge branch 'scripts-reorg' of github.com:shane-huang/incubator-spark into ↵	Prashant Sharma	2014-01-02	1	-1/+1
\| \| \| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	spark-915-segregate-scripts Conflicts: bin/spark-shell core/pom.xml core/src/main/scala/org/apache/spark/SparkContext.scala core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala core/src/main/scala/org/apache/spark/ui/UIWorkloadGenerator.scala core/src/test/scala/org/apache/spark/DriverSuite.scala python/run-tests sbin/compute-classpath.sh sbin/spark-class sbin/stop-slaves.sh
\| \| \| \| *	Merge branch 'reorgscripts' into scripts-reorg	shane-huang	2013-09-27	1	-1/+1
\| \| \| \| \|\
\| \| \| \| \| *	fix paths and change spark to use APP_MEM as application driver memory ↵	shane-huang	2013-09-26	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	instead of SPARK_MEM, user should add application jars to SPARK_CLASSPATH Signed-off-by: shane-huang <shengsheng.huang@intel.com>
\| \| \| \| \| *	added spark-class and spark-executor to sbin	shane-huang	2013-09-23	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: shane-huang <shengsheng.huang@intel.com>
\| * \| \| \| \|	Changes on top of Prashant's patch.	Patrick Wendell	2014-01-03	1	-1/+1
\| \|/ / / / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Closes #316
* \| \| \| \|	Added predictAll python function to MatrixFactorizationModel	Hossein Falaki	2014-01-06	1	-4/+6
\| \| \| \| \|
* \| \| \| \|	Added Rating deserializer	Hossein Falaki	2014-01-06	1	-3/+18
\| \| \| \| \|
* \| \| \| \|	Added python binding for bulk recommendation	Hossein Falaki	2014-01-04	2	-1/+19
\|/ / / /
* \| \| \|	Merge pull request #311 from tmyklebu/master	Matei Zaharia	2014-01-02	1	-11/+55
\|\ \ \ \ \| \|/ / / \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SPARK-991: Report information gleaned from a Python stacktrace in the UI Scala: - Added setCallSite/clearCallSite to SparkContext and JavaSparkContext. These functions mutate a LocalProperty called "externalCallSite." - Add a wrapper, getCallSite, that checks for an externalCallSite and, if none is found, calls the usual Utils.formatSparkCallSite. - Change everything that calls Utils.formatSparkCallSite to call getCallSite instead. Except getCallSite. - Add wrappers to setCallSite/clearCallSite wrappers to JavaSparkContext. Python: - Add a gruesome hack to rdd.py that inspects the traceback and guesses what you want to see in the UI. - Add a RAII wrapper around said gruesome hack that calls setCallSite/clearCallSite as appropriate. - Wire said RAII wrapper up around three calls into the Scala code. I'm not sure that I hit all the spots with the RAII wrapper. I'm also not sure that my gruesome hack does exactly what we want. One could also approach this change by refactoring runJob/submitJob/runApproximateJob to take a call site, then threading that parameter through everything that needs to know it. One might object to the pointless-looking wrappers in JavaSparkContext. Unfortunately, I can't directly access the SparkContext from Python---or, if I can, I don't know how---so I need to wrap everything that matters in JavaSparkContext. Conflicts: core/src/main/scala/org/apache/spark/api/java/JavaSparkContext.scala
\| * \| \|	Make Python function/line appear in the UI.	Tor Myklebust	2013-12-28	1	-11/+55
\| \| \| \|
* \| \| \|	Fix Python code after change of getOrElse	Matei Zaharia	2014-01-01	2	-7/+14
\| \| \| \|
* \| \| \|	Miscellaneous fixes from code review.	Matei Zaharia	2014-01-01	1	-8/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Also replaced SparkConf.getOrElse with just a "get" that takes a default value, and added getInt, getLong, etc to make code that uses this simpler later on.
* \| \| \|	Merge remote-tracking branch 'apache/master' into conf2	Matei Zaharia	2013-12-31	2	-9/+4
\|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/rdd/CheckpointRDD.scala streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala
\| * \ \ \	Merge pull request #289 from tdas/filestream-fix	Patrick Wendell	2013-12-31	2	-9/+4
\| \|\ \ \ \ \| \| \|/ / / \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Bug fixes for file input stream and checkpointing - Fixed bugs in the file input stream that led the stream to fail due to transient HDFS errors (listing files when a background thread it deleting fails caused errors, etc.) - Updated Spark's CheckpointRDD and Streaming's CheckpointWriter to use SparkContext.hadoopConfiguration, to allow checkpoints to be written to any HDFS compatible store requiring special configuration. - Changed the API of SparkContext.setCheckpointDir() - eliminated the unnecessary 'useExisting' parameter. Now SparkContext will always create a unique subdirectory within the user specified checkpoint directory. This is to ensure that previous checkpoint files are not accidentally overwritten. - Fixed bug where setting checkpoint directory as a relative local path caused the checkpointing to fail.
\| \| * \| \|	Fixed Python API for sc.setCheckpointDir. Also other fixes based on ↵	Tathagata Das	2013-12-24	2	-9/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reynold's comments on PR 289.
* \| \| \| \|	Updated docs for SparkConf and handled review comments	Matei Zaharia	2013-12-30	2	-17/+31
\| \| \| \| \|
* \| \| \| \|	Properly show Spark properties on web UI, and change app name property	Matei Zaharia	2013-12-29	2	-3/+3
\| \| \| \| \|
* \| \| \| \|	Fix some Python docs and make sure to unset SPARK_TESTING in Python	Matei Zaharia	2013-12-29	4	-20/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	tests so we don't get the test spark.conf on the classpath.
* \| \| \| \|	Merge remote-tracking branch 'origin/master' into conf2	Matei Zaharia	2013-12-29	9	-2/+599
\|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/SparkContext.scala core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala core/src/main/scala/org/apache/spark/scheduler/local/LocalScheduler.scala core/src/main/scala/org/apache/spark/util/MetadataCleaner.scala core/src/test/scala/org/apache/spark/scheduler/TaskResultGetterSuite.scala core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala new-yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala streaming/src/test/scala/org/apache/spark/streaming/BasicOperationsSuite.scala streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala streaming/src/test/scala/org/apache/spark/streaming/TestSuiteBase.scala streaming/src/test/scala/org/apache/spark/streaming/WindowOperationsSuite.scala
\| * \| \| \|	Merge pull request #283 from tmyklebu/master	Matei Zaharia	2013-12-26	8	-1/+598
\| \|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Python bindings for mllib This pull request contains Python bindings for the regression, clustering, classification, and recommendation tools in mllib. For each 'train' frontend exposed, there is a Scala stub in PythonMLLibAPI.scala and a Python stub in mllib.py. The Python stub serialises the input RDD and any vector/matrix arguments into a mutually-understood format and calls the Scala stub. The Scala stub deserialises the RDD and the vector/matrix arguments, calls the appropriate 'train' function, serialises the resulting model, and returns the serialised model. ALSModel is slightly different since a MatrixFactorizationModel has RDDs inside. The Scala stub returns a handle to a Scala MatrixFactorizationModel; prediction is done by calling the Scala predict method. I have tested these bindings on an x86_64 machine running Linux. There is a risk that these bindings may fail on some choose-your-own-endian platform if Python's endian differs from java.nio.ByteBuffer's idea of the native byte order.
\| \| * \| \| \|	Remove commented code in __init__.py.	Tor Myklebust	2013-12-25	1	-8/+0
\| \| \| \| \| \|
\| \| * \| \| \|	Fix copypasta in __init__.py. Don't import anything directly into ↵	Tor Myklebust	2013-12-25	1	-26/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	pyspark.mllib.
\| \| * \| \| \|	Initial weights in Scala are ones; do that too. Also fix some errors.	Tor Myklebust	2013-12-25	1	-6/+6
\| \| \| \| \| \|
\| \| * \| \| \|	Split the mllib bindings into a whole bunch of modules and rename some things.	Tor Myklebust	2013-12-25	7	-183/+409
\| \| \| \| \| \|
\| \| * \| \| \|	Remove useless line from test stub.	Tor Myklebust	2013-12-24	1	-1/+0
\| \| \| \| \| \|
\| \| * \| \| \|	Python change for move of PythonMLLibAPI.	Tor Myklebust	2013-12-24	1	-1/+1
\| \| \| \| \| \|
\| \| * \| \| \|	Release JVM reference to the ALSModel when done.	Tor Myklebust	2013-12-22	1	-2/+2
\| \| \| \| \| \|
\| \| * \| \| \|	Python stubs for ALSModel.	Tor Myklebust	2013-12-21	2	-8/+56
\| \| \| \| \| \|
\| \| * \| \| \|	Un-semicolon mllib.py.	Tor Myklebust	2013-12-20	1	-11/+11
\| \| \| \| \| \|
\| \| * \| \| \|	Change some docstrings and add some others.	Tor Myklebust	2013-12-20	1	-1/+3
\| \| \| \| \| \|
\| \| * \| \| \|	Licence notice.	Tor Myklebust	2013-12-20	1	-0/+17
\| \| \| \| \| \|
\| \| * \| \| \|	Whitespace.	Tor Myklebust	2013-12-20	1	-1/+1
\| \| \| \| \| \|
\| \| * \| \| \|	Remove gigantic endian-specific test and exception tests.	Tor Myklebust	2013-12-20	1	-38/+3
\| \| \| \| \| \|
\| \| * \| \| \|	Tests for the Python side of the mllib bindings.	Tor Myklebust	2013-12-20	1	-52/+172
\| \| \| \| \| \|
\| \| * \| \| \|	Python stubs for classification and clustering.	Tor Myklebust	2013-12-20	2	-16/+96
\| \| \| \| \| \|
\| \| * \| \| \|	Python side of python bindings for linear, Lasso, and ridge regression	Tor Myklebust	2013-12-19	2	-15/+72
\| \| \| \| \| \|
\| \| * \| \| \|	Incorporate most of Josh's style suggestions. I don't want to deal with the ↵	Tor Myklebust	2013-12-19	2	-98/+91
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	type and length checking errors until we've got at least one working stub that we're all happy with.
\| \| * \| \| \|	The rest of the Python side of those bindings.	Tor Myklebust	2013-12-19	3	-2/+4
\| \| \| \| \| \|
\| \| * \| \| \|	First cut at python mllib bindings. Only LinearRegression is supported.	Tor Myklebust	2013-12-19	1	-0/+114
\| \| \| \| \| \|
\| * \| \| \| \|	Typo: avaiable -> available	Andrew Ash	2013-12-24	1	-1/+1
\| \| \|/ / / \| \|/\| \| \|
* \| \| \| \|	Add Python docs about SparkConf	Matei Zaharia	2013-12-29	2	-1/+44
\| \| \| \| \|
* \| \| \| \|	Fix some other Python tests due to initializing JVM in a different way	Matei Zaharia	2013-12-29	2	-10/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The test in context.py created two different instances of the SparkContext class by copying "globals", so that some tests can have a global "sc" object and others can try initializing their own contexts. This led to two JVM gateways being created since SparkConf also looked at pyspark.context.SparkContext to get the JVM.
* \| \| \| \|	Add SparkConf support in Python	Matei Zaharia	2013-12-29	4	-13/+146
\| \| \| \| \|
* \| \| \| \|	Fix Python use of getLocalDir	Matei Zaharia	2013-12-29	1	-1/+1
\|/ / / /
* \| \| \|	Merge pull request #276 from shivaram/collectPartition	Reynold Xin	2013-12-19	2	-4/+6
\|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add collectPartition to JavaRDD interface. This interface is useful for implementing `take` from other language frontends where the data is serialized. Also remove `takePartition` from PythonRDD and use `collectPartition` in rdd.py. Thanks @concretevitamin for the original change and tests.