spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge branch 'master' into wip-scala-2.10	Prashant Sharma	2013-11-27	2	-3/+27
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala core/src/main/scala/org/apache/spark/rdd/MapPartitionsRDD.scala core/src/main/scala/org/apache/spark/rdd/MapPartitionsWithContextRDD.scala core/src/main/scala/org/apache/spark/rdd/RDD.scala python/pyspark/rdd.py
\| *	Update tuning.md	Andrew Ash	2013-11-25	1	-1/+2
\| \| \| \| \| \|	Clarify when serializer is used based on recent user@ mailing list discussion.
\| *	Merge pull request #101 from colorant/yarn-client-scheduler	Matei Zaharia	2013-11-25	1	-2/+25
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For SPARK-527, Support spark-shell when running on YARN sync to trunk and resubmit here In current YARN mode approaching, the application is run in the Application Master as a user program thus the whole spark context is on remote. This approaching won't support application that involve local interaction and need to be run on where it is launched. So In this pull request I have a YarnClientClusterScheduler and backend added. With this scheduler, the user application is launched locally,While the executor will be launched by YARN on remote nodes with a thin AM which only launch the executor and monitor the Driver Actor status, so that when client app is done, it can finish the YARN Application as well. This enables spark-shell to run upon YARN. This also enable other Spark applications to have the spark context to run locally with a master-url "yarn-client". Thus e.g. SparkPi could have the result output locally on console instead of output in the log of the remote machine where AM is running on. Docs also updated to show how to use this yarn-client mode.
\| \| *	Add YarnClientClusterScheduler and Backend.	Raymond Liu	2013-11-22	1	-2/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With this scheduler, the user application is launched locally, While the executor will be launched by YARN on remote nodes. This enables spark-shell to run upon YARN.
* \| \|	Improvements from the review comments and followed Boy Scout Rule.	Prashant Sharma	2013-11-27	1	-2/+2
\| \| \|
* \| \|	Documenting the newly added spark properties.	Prashant Sharma	2013-11-26	1	-1/+22
\| \| \|
* \| \|	Merge branch 'master' into scala-2.10-wip	Prashant Sharma	2013-11-25	2	-1/+2
\|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/rdd/RDD.scala project/SparkBuild.scala
\| * \|	Merge pull request #151 from russellcardullo/add-graphite-sink	Matei Zaharia	2013-11-24	1	-0/+1
\| \|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add graphite sink for metrics This adds a metrics sink for graphite. The sink must be configured with the host and port of a graphite node and optionally may be configured with a prefix that will be prepended to all metrics that are sent to graphite.
\| \| * \|	Add graphite sink for metrics	Russell Cardullo	2013-11-08	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds a metrics sink for graphite. The sink must be configured with the host and port of a graphite node and optionally may be configured with a prefix that will be prepended to all metrics that are sent to graphite.
\| * \| \|	Fix Kryo Serializer buffer inconsistency	Neal Wiggins	2013-11-20	1	-1/+1
\| \| \|/ \| \|/\| \| \| \|	The documentation here is inconsistent with the coded default and other documentation.
* \| \|	Merge branch 'master' of github.com:apache/incubator-spark into scala-2.10-temp	Prashant Sharma	2013-11-21	1	-0/+2
\|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/util/collection/PrimitiveVector.scala streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala
\| * \|	Impove Spark on Yarn Error handling	tgravescs	2013-11-19	1	-0/+2
\| \| \|
\| * \|	Fixed typos in the CDH4 distributions version codes.	RIA-pierre-borckmans	2013-11-14	1	-2/+2
\| \| \|
* \| \|	Various merge corrections	Aaron Davidson	2013-11-14	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I've diff'd this patch against my own -- since they were both created independently, this means that two sets of eyes have gone over all the merge conflicts that were created, so I'm feeling significantly more confident in the resulting PR. @rxin has looked at the changes to the repl and is resoundingly confident that they are correct.
* \| \|	Merge branch 'master' into scala-2.10	Raymond Liu	2013-11-14	1	-0/+1
\|\\| \|
\| * \|	Allow spark on yarn to be run from HDFS. Allows the spark.jar, app.jar, and ↵	tgravescs	2013-11-04	1	-0/+1
\| \|/ \| \| \| \| \| \|	log4j.properties to be put into hdfs.
* \|	Merge branch 'master' into scala-2.10	Raymond Liu	2013-11-13	9	-11/+127
\|\\|
\| *	fix persistent-hdfs	Fabrizio (Misto) Milo	2013-11-01	1	-1/+1
\| \|
\| *	Document all the URIs for addJar/addFile	Evan Chan	2013-11-01	1	-1/+13
\| \|
\| *	Add a `repartition` operator.	Patrick Wendell	2013-10-24	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds an operator called repartition with more straightforward semantics than the current `coalesce` operator. There are a few use cases where this operator is useful: 1. If a user wants to increase the number of partitions in the RDD. This is more common now with streaming. E.g. a user is ingesting data on one node but they want to add more partitions to ensure parallelism of subsequent operations across threads or the cluster. Right now they have to call rdd.coalesce(numSplits, shuffle=true) - that's super confusing. 2. If a user has input data where the number of partitions is not known. E.g. > sc.textFile("some file").coalesce(50).... This is both vague semantically (am I growing or shrinking this RDD) but also, may not work correctly if the base RDD has fewer than 50 partitions. The new operator forces shuffles every time, so it will always produce exactly the number of new partitions. It also throws an exception rather than silently not-working if a bad input is passed. I am currently adding streaming tests (requires refactoring some of the test suite to allow testing at partition granularity), so this is not ready for merge yet. But feedback is welcome.
\| *	Merge pull request #97 from ewencp/pyspark-system-properties	Matei Zaharia	2013-10-22	1	-0/+11
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add classmethod to SparkContext to set system properties. Add a new classmethod to SparkContext to set system properties like is possible in Scala/Java. Unlike the Java/Scala implementations, there's no access to System until the JVM bridge is created. Since SparkContext handles that, move the initialization of the JVM connection to a separate classmethod that can safely be called repeatedly as long as the same instance (or no instance) is provided.
\| \| *	Add notes to python documentation about using SparkContext.setSystemProperty.	Ewen Cheslack-Postava	2013-10-22	1	-0/+11
\| \| \|
\| * \|	Docs: Fix links to RDD API documentation	Aaron Davidson	2013-10-22	1	-3/+3
\| \|/
\| *	Merge pull request #76 from pwendell/master	Reynold Xin	2013-10-18	1	-1/+1
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Clarify compression property. Clarifies that this governs compression of internal data, not input data or output data.
\| \| *	Clarify compression property.	Patrick Wendell	2013-10-18	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Clarifies that this governs compression of internal data, not input data or output data.
\| * \|	Code styling. Updated doc.	Mosharaf Chowdhury	2013-10-17	1	-0/+8
\| \|/
\| *	Merge remote-tracking branch 'tgravescs/sparkYarnDistCache'	Matei Zaharia	2013-10-10	1	-1/+8
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Closes #11 Conflicts: docs/running-on-yarn.md yarn/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala
\| \| *	Adding in the --addJars option to make SparkContext.addJar work on yarn and ↵	tgravescs	2013-10-03	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	cleanup the classpaths
\| \| *	Support distributed cache files and archives on spark on yarn and attempt to ↵	Y.CORP.YAHOO.COM\tgraves	2013-09-23	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \|	cleanup the staging directory on exit
\| * \|	Merge pull request #19 from aarondav/master-zk	Matei Zaharia	2013-10-10	3	-4/+78
\| \|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Standalone Scheduler fault tolerance using ZooKeeper This patch implements full distributed fault tolerance for standalone scheduler Masters. There is only one master Leader at a time, which is actively serving scheduling requests. If this Leader crashes, another master will eventually be elected, reconstruct the state from the first Master, and continue serving scheduling requests. Leader election is performed using the ZooKeeper leader election pattern. We try to minimize the use of ZooKeeper and the assumptions about ZooKeeper's behavior, so there is a layer of retries and session monitoring on top of the ZooKeeper client. Master failover follows directly from the single-node Master recovery via the file system (patch d5a96fe), save that the Master state is stored in ZooKeeper instead. Configuration: By default, no recovery mechanism is enabled (spark.deploy.recoveryMode = NONE). By setting spark.deploy.recoveryMode to ZOOKEEPER and setting spark.deploy.zookeeper.url to an appropriate ZooKeeper URL, ZooKeeper recovery mode is enabled. By setting spark.deploy.recoveryMode to FILESYSTEM and setting spark.deploy.recoveryDirectory to an appropriate directory accessible by the Master, we will keep the behavior of from d5a96fe. Additionally, places where a Master could be specificied by a spark:// url can now take comma-delimited lists to specify backup masters. Note that this is only used for registration of NEW Workers and application Clients. Once a Worker or Client has registered with the Master Leader, it is "in the system" and will never need to register again.
\| \| * \|	Minor clarification and cleanup to spark-standalone.md	Aaron Davidson	2013-10-10	1	-10/+33
\| \| \| \|
\| \| * \|	Address Matei's comments on documentation	Aaron Davidson	2013-10-10	1	-14/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Updates to the documentation and changing some logError()s to logWarning()s.
\| \| * \|	Add docs for standalone scheduler fault tolerance	Aaron Davidson	2013-10-08	3	-4/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Also fix a couple HTML/Markdown issues in other files.
* \| \| \|	Merge branch 'master' of github.com:apache/incubator-spark into scala-2.10	Prashant Sharma	2013-10-10	2	-5/+21
\|\\| \| \|
\| * \| \|	Fix PySpark docs and an overly long line of code after fdbae41e	Matei Zaharia	2013-10-09	1	-1/+1
\| \| \| \|
\| * \| \|	Merge branch 'master' into implicit-als	Nick Pentreath	2013-10-07	1	-2/+2
\| \|\ \ \
\| * \| \| \|	Adding implicit feedback ALS to MLlib user guide	Nick Pentreath	2013-10-04	1	-4/+20
\| \| \| \| \|
* \| \| \| \|	Merge branch 'master' into wip-merge-master	Prashant Sharma	2013-10-08	1	-2/+2
\|\ \ \ \ \ \| \| \|/ / / \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: bagel/pom.xml core/pom.xml core/src/test/scala/org/apache/spark/ui/UISuite.scala examples/pom.xml mllib/pom.xml pom.xml project/SparkBuild.scala repl/pom.xml streaming/pom.xml tools/pom.xml In scala 2.10, a shorter representation is used for naming artifacts so changed to shorter scala version for artifacts and made it a property in pom.
\| * \| \| \|	Merging build changes in from 0.8	Patrick Wendell	2013-10-05	1	-2/+2
\| \|/ / /
* \| \| \|	Merge branch 'master' into scala-2.10	Prashant Sharma	2013-10-05	1	-0/+1
\|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/test/scala/org/apache/spark/DistributedSuite.scala project/SparkBuild.scala
\| * \| \|	Allow users to set the application name for Spark on Yarn	tgravescs	2013-10-02	1	-0/+1
\| \|/ /
* \| \|	Merge branch 'master' into scala-2.10	Prashant Sharma	2013-10-01	1	-2/+2
\|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressUI.scala docs/_config.yml project/SparkBuild.scala repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala
\| * \|	Update build version in master	Patrick Wendell	2013-09-24	1	-2/+2
\| \|/
* \|	Sync with master and some build fixes	Prashant Sharma	2013-09-26	2	-4/+4
\|\\|
\| *	Fix typo in Maven build docs	Jey Kottalam	2013-09-15	1	-2/+2
\| \|
\| *	Merge pull request #932 from pwendell/mesos-version	Patrick Wendell	2013-09-15	1	-1/+1
\| \|\ \| \| \| \| \| \|	Bumping Mesos version to 0.13.0
\| \| *	Bumping Mesos version to 0.13.0	Patrick Wendell	2013-09-15	1	-1/+1
\| \| \|
\| * \|	Explain yarn.version in Maven build docs	Patrick Wendell	2013-09-15	1	-3/+3
\| \|/
* /	version changed 2.9.3 -> 2.10 in shell script.	Prashant Sharma	2013-09-15	2	-2/+2
\|/
*	More updates to Spark on Mesos documentation.	Benjamin Hindman	2013-09-11	1	-2/+2
\|