spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge remote-tracking branch 'upstream/master' into sparsesvd	Reza Zadeh	2014-01-09	15	-149/+433
\|\ \| \| \| \| \| \| \| \|	Conflicts: docs/mllib-guide.md
\| *	Merge pull request #353 from pwendell/ipython-simplify	Patrick Wendell	2014-01-09	1	-2/+3
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Simplify and fix pyspark script. This patch removes compatibility for IPython < 1.0 but fixes the launch script and makes it much simpler. I tested this using the three commands in the PySpark documentation page: 1. IPYTHON=1 ./pyspark 2. IPYTHON_OPTS="notebook" ./pyspark 3. IPYTHON_OPTS="notebook --pylab inline" ./pyspark There are two changes: - We rely on PYTHONSTARTUP env var to start PySpark - Removed the quotes around $IPYTHON_OPTS... having quotes gloms them together as a single argument passed to `exec` which seemed to cause ipython to fail (it instead expects them as multiple arguments).
\| \| *	Simplify and fix pyspark script.	Patrick Wendell	2014-01-07	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch removes compatibility for IPython < 1.0 but fixes the launch script and makes it much simpler. I tested this using the three commands in the PySpark documentation page: 1. IPYTHON=1 ./pyspark 2. IPYTHON_OPTS="notebook" ./pyspark 3. IPYTHON_OPTS="notebook --pylab inline" ./pyspark There are two changes: - We rely on PYTHONSTARTUP env var to start PySpark - Removed the quotes around $IPYTHON_OPTS... having quotes gloms them together as a single argument passed to `exec` which seemed to cause ipython to fail (it instead expects them as multiple arguments).
\| * \|	Merge pull request #293 from pwendell/standalone-driver	Patrick Wendell	2014-01-09	1	-5/+33
\| \|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SPARK-998: Support Launching Driver Inside of Standalone Mode [NOTE: I need to bring the tests up to date with new changes, so for now they will fail] This patch provides support for launching driver programs inside of a standalone cluster manager. It also supports monitoring and re-launching of driver programs which is useful for long running, recoverable applications such as Spark Streaming jobs. For those jobs, this patch allows a deployment mode which is resilient to the failure of any worker node, failure of a master node (provided a multi-master setup), and even failures of the applicaiton itself, provided they are recoverable on a restart. Driver information, such as the status and logs from a driver, is displayed in the UI There are a few small TODO's here, but the code is generally feature-complete. They are: - Bring tests up to date and add test coverage - Restarting on failure should be optional and maybe off by default. - See if we can re-use akka connections to facilitate clients behind a firewall A sensible place to start for review would be to look at the `DriverClient` class which presents users the ability to launch their driver program. I've also added an example program (`DriverSubmissionTest`) that allows you to test this locally and play around with killing workers, etc. Most of the code is devoted to persisting driver state in the cluster manger, exposing it in the UI, and dealing correctly with various types of failures. Instructions to test locally: - `sbt/sbt assembly/assembly examples/assembly` - start a local version of the standalone cluster manager ``` ./spark-class org.apache.spark.deploy.client.DriverClient \ -j -Dspark.test.property=something \ -e SPARK_TEST_KEY=SOMEVALUE \ launch spark://10.99.1.14:7077 \ ../path-to-examples-assembly-jar \ org.apache.spark.examples.DriverSubmissionTest 1000 some extra options --some-option-here -X 13 ``` - Go in the UI and make sure it started correctly, look at the output etc - Kill workers, the driver program, masters, etc.
\| \| * \	Merge remote-tracking branch 'apache-github/master' into standalone-driver	Patrick Wendell	2014-01-08	14	-81/+348
\| \| \|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/test/scala/org/apache/spark/deploy/JsonProtocolSuite.scala pom.xml
\| \| * \| \|	Fixes	Patrick Wendell	2014-01-08	1	-2/+3
\| \| \| \| \|
\| \| * \| \|	Some doc fixes	Patrick Wendell	2014-01-06	1	-3/+2
\| \| \| \| \|
\| \| * \| \|	Merge remote-tracking branch 'apache-github/master' into standalone-driver	Patrick Wendell	2014-01-06	23	-155/+228
\| \| \|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/deploy/client/AppClient.scala core/src/main/scala/org/apache/spark/deploy/client/TestClient.scala core/src/main/scala/org/apache/spark/deploy/master/Master.scala core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala
\| \| * \| \| \|	Documentation and adding supervise option	Patrick Wendell	2013-12-29	1	-5/+33
\| \| \| \| \| \|
\| * \| \| \| \|	Fixing config option "retained_stages" => "retainedStages".	Patrick Wendell	2014-01-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a very esoteric option and it's out of sync with the style we use. So it seems fitting to fix it for 0.9.0.
\| * \| \| \| \|	Merge pull request #345 from colorant/yarn	Thomas Graves	2014-01-08	1	-0/+2
\| \|\ \ \ \ \ \| \| \|_\|_\|/ / \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	support distributing extra files to worker for yarn client mode So that user doesn't need to package all dependency into one assemble jar as spark app jar
\| \| * \| \| \|	Export --file for YarnClient mode to support sending extra files to worker ↵	Raymond Liu	2014-01-07	1	-0/+2
\| \| \| \|/ / \| \| \|/\| \| \| \| \| \| \| \| \| \| \| \|	on yarn cluster
\| * \| \| \|	Merge pull request #322 from falaki/MLLibDocumentationImprovement	Patrick Wendell	2014-01-07	1	-56/+274
\| \|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SPARK-1009 Updated MLlib docs to show how to use it in Python In addition added detailed examples for regression, clustering and recommendation algorithms in a separate Scala section. Fixed a few minor issues with existing documentation.
\| \| * \ \ \	Fixed merge conflict	Hossein Falaki	2014-01-07	19	-149/+233
\| \| \|\ \ \ \ \| \| \| \| \|_\|/ \| \| \| \|/\| \|
\| \| * \| \| \|	Added proper evaluation example for collaborative filtering and fixed typo	Hossein Falaki	2014-01-06	1	-4/+8
\| \| \| \| \| \|
\| \| * \| \| \|	Added table of contents and minor fixes	Hossein Falaki	2014-01-03	1	-8/+16
\| \| \| \| \| \|
\| \| * \| \| \|	Commented the last part of collaborative filtering examples that lead to errors	Hossein Falaki	2014-01-02	1	-5/+6
\| \| \| \| \| \|
\| \| * \| \| \|	Added Scala and Python examples for mllib	Hossein Falaki	2014-01-02	1	-52/+261
\| \| \| \| \| \|
\| * \| \| \| \|	Address review comments	Matei Zaharia	2014-01-07	1	-2/+2
\| \| \| \| \| \|
\| * \| \| \| \|	Add way to limit default # of cores used by applications on standalone mode	Matei Zaharia	2014-01-07	4	-8/+42
\| \| \|/ / / \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \|	Also documents the spark.deploy.spreadOut option.
\| * \| \| \|	Merge pull request #339 from ScrapCodes/conf-improvements	Patrick Wendell	2014-01-07	1	-0/+15
\| \|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conf improvements There are two new features. 1. Allow users to set arbitrary akka configurations via spark conf. 2. Allow configuration to be printed in logs for diagnosis.
\| \| * \| \| \|	formatting related fixes suggested by Patrick.	Prashant Sharma	2014-01-07	1	-1/+1
\| \| \| \| \| \|
\| \| * \| \| \|	Allow configuration to be printed in logs for diagnosis.	Prashant Sharma	2014-01-07	1	-0/+7
\| \| \| \| \| \|
\| \| * \| \| \|	Allow users to set arbitrary akka configurations via spark conf.	Prashant Sharma	2014-01-07	1	-0/+8
\| \| \| \|/ / \| \| \|/\| \|
\| * \| \| \|	Merge pull request #331 from holdenk/master	Reynold Xin	2014-01-07	9	-18/+18
\| \|\ \ \ \ \| \| \|/ / / \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a script to download sbt if not present on the system As per the discussion on the dev mailing list this script will use the system sbt if present or otherwise attempt to install the sbt launcher. The fall back error message in the event it fails instructs the user to install sbt. While the URLs it fetches from aren't controlled by the spark project directly, they are stable and the current authoritative sources.
\| \| * \| \|	Code review feedback	Holden Karau	2014-01-05	9	-18/+18
\| \| \| \| \|
\| * \| \| \|	Clarify spark.cores.max	Andrew Ash	2014-01-06	1	-1/+2
\| \|/ / / \| \| \| \| \| \| \| \|	It controls the count of cores across the cluster, not on a per-machine basis.
\| * \| \|	Merge remote-tracking branch 'apache-github/master' into remove-binaries	Patrick Wendell	2014-01-03	12	-64/+57
\| \|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/test/scala/org/apache/spark/DriverSuite.scala docs/python-programming-guide.md
\| \| * \ \	Merge pull request #317 from ScrapCodes/spark-915-segregate-scripts	Patrick Wendell	2014-01-03	11	-52/+52
\| \| \|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Spark-915 segregate scripts
\| \| \| * \| \|	sbin/spark-class* -> bin/spark-class*	Prashant Sharma	2014-01-03	2	-3/+3
\| \| \| \| \| \|
\| \| \| * \| \|	a few left over document change	Prashant Sharma	2014-01-02	1	-1/+1
\| \| \| \| \| \|
\| \| \| * \| \|	pyspark -> bin/pyspark	Prashant Sharma	2014-01-02	3	-17/+17
\| \| \| \| \| \|
\| \| \| * \| \|	run-example -> bin/run-example	Prashant Sharma	2014-01-02	6	-12/+12
\| \| \| \| \| \|
\| \| \| * \| \|	spark-shell -> bin/spark-shell	Prashant Sharma	2014-01-02	7	-13/+13
\| \| \| \| \| \|
\| \| \| * \| \|	Merge branch 'scripts-reorg' of github.com:shane-huang/incubator-spark into ↵	Prashant Sharma	2014-01-02	2	-9/+9
\| \| \| \|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	spark-915-segregate-scripts Conflicts: bin/spark-shell core/pom.xml core/src/main/scala/org/apache/spark/SparkContext.scala core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala core/src/main/scala/org/apache/spark/ui/UIWorkloadGenerator.scala core/src/test/scala/org/apache/spark/DriverSuite.scala python/run-tests sbin/compute-classpath.sh sbin/spark-class sbin/stop-slaves.sh
\| \| \| \| * \ \	Merge branch 'reorgscripts' into scripts-reorg	shane-huang	2013-09-27	2	-9/+9
\| \| \| \| \|\ \ \
\| \| \| \| \| * \| \|	add admin scripts to sbin	shane-huang	2013-09-23	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: shane-huang <shengsheng.huang@intel.com>
\| \| \| \| \| * \| \|	added spark-class and spark-executor to sbin	shane-huang	2013-09-23	2	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: shane-huang <shengsheng.huang@intel.com>
\| \| * \| \| \| \| \|	fix docs for yarn	Raymond Liu	2014-01-03	1	-3/+0
\| \| \| \| \| \| \| \|
\| \| * \| \| \| \| \|	Using name yarn-alpha/yarn instead of yarn-2.0/yarn-2.2	Raymond Liu	2014-01-03	1	-4/+4
\| \| \| \| \| \| \| \|
\| \| * \| \| \| \| \|	Update maven build documentation	Raymond Liu	2014-01-03	2	-8/+4
\| \| \| \| \| \| \| \|
\| \| * \| \| \| \| \|	Fix yarn/README.md and update docs/running-on-yarn.md	Raymond Liu	2014-01-03	1	-1/+1
\| \| \|/ / / / /
\| * \| \| \| \| \|	fixed review comments	Prashant Sharma	2014-01-03	1	-2/+2
\| \| \| \| \| \| \|
\| * \| \| \| \| \|	Merge branch 'master' into spark-1002-remove-jars	Prashant Sharma	2014-01-03	12	-75/+154
\| \|\\| \| \| \| \|
\| * \| \| \| \| \|	Removed sbt folder and changed docs accordingly	Prashant Sharma	2014-01-02	9	-15/+15
\| \| \|_\|_\|/ / \| \|/\| \| \| \|
\| * \| \| \| \|	Revert "Merge pull request #310 from jyunfan/master"	Reynold Xin	2013-12-28	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit 79b20e4dbe3dcd8559ec8316784d3334bb55868b, reversing changes made to 7375047d516c5aa69221611f5f7b0f1d367039af.
\| * \| \| \| \|	Fix typo in the Accumulators section	Jyun-Fan Tsai	2013-12-29	1	-1/+1
\| \| \|_\|_\|/ \| \|/\| \| \| \| \| \| \| \|	val => var
* \| \| \| \|	documentation for sparsematrix	Reza Zadeh	2014-01-07	1	-3/+4
\| \| \| \| \|
* \| \| \| \|	fix docs to use SparseMatrix	Reza Zadeh	2014-01-05	1	-2/+5
\| \| \| \| \|
* \| \| \| \|	add k parameter	Reza Zadeh	2014-01-04	1	-2/+3
\| \| \| \| \|