spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge branch 'master' into akka-bug-fix	Prashant Sharma	2013-12-11	1	-7/+23
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/pom.xml core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala pom.xml project/SparkBuild.scala streaming/pom.xml yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala
\| *	Use published "org.spark-project.akka-*" in sbt build for Hadoop-2.2 ↵	Harvey Feng	2013-12-03	1	-13/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	dependencies. This also includes: -Change `isNewYarn` to `isNewHadoop`, since the protobuf-2.5 dependency is from Hadoop-2.2 itself. -Regexp bugix Credits to @alig for this patch.
\| *	Merge remote-tracking branch 'origin/master' into yarn-2.2	Harvey Feng	2013-11-26	1	-0/+1
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
\| * \|	Add optional Hadoop 2.2 settings in sbt build.	Harvey Feng	2013-11-26	1	-9/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the Hadoop used is version 2.2 or derived from it, then Spark will be compiled against protobuf-2.5 and a protobuf-2.5 version of Akka 2.0.5.
* \| \|	Merge branch 'master' into scala-2.10-wip	Prashant Sharma	2013-11-25	1	-1/+2
\|\ \ \ \| \| \|/ \| \|/\| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/rdd/RDD.scala project/SparkBuild.scala
\| * \|	Merge pull request #151 from russellcardullo/add-graphite-sink	Matei Zaharia	2013-11-24	1	-0/+1
\| \|\ \ \| \| \|/ \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add graphite sink for metrics This adds a metrics sink for graphite. The sink must be configured with the host and port of a graphite node and optionally may be configured with a prefix that will be prepended to all metrics that are sent to graphite.
\| \| *	Add graphite sink for metrics	Russell Cardullo	2013-11-08	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds a metrics sink for graphite. The sink must be configured with the host and port of a graphite node and optionally may be configured with a prefix that will be prepended to all metrics that are sent to graphite.
* \| \|	Use Kafka 2.10 (again)	Aaron Davidson	2013-11-14	1	-2/+3
\| \| \|
* \| \|	Various merge corrections	Aaron Davidson	2013-11-14	1	-9/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I've diff'd this patch against my own -- since they were both created independently, this means that two sets of eyes have gone over all the merge conflicts that were created, so I'm feeling significantly more confident in the resulting PR. @rxin has looked at the changes to the repl and is resoundingly confident that they are correct.
* \| \|	Some fixes for previous master merge commits	Raymond Liu	2013-11-15	1	-0/+1
\| \| \|
* \| \|	Merge branch 'master' into scala-2.10	Raymond Liu	2013-11-14	2	-3/+4
\|\\| \|
\| * \|	Merge pull request #165 from NathanHowell/kerberos-master	Matei Zaharia	2013-11-13	2	-2/+2
\| \|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	spark-assembly.jar fails to authenticate with YARN ResourceManager The META-INF/services/ sbt MergeStrategy was discarding support for Kerberos, among others. This pull request changes to a merge strategy similar to sbt-assembly's default. I've also included an update to sbt-assembly 0.9.2, a minor fix to it's zip file handling.
\| \| * \|	Upgrade to sbt-assembly 0.9.2	Nathan Howell	2013-11-12	1	-1/+1
\| \| \| \|
\| \| * \|	spark-assembly.jar fails to authenticate with YARN ResourceManager	Nathan Howell	2013-11-12	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	sbt-assembly is setup to pick the first META-INF/services/org.apache.hadoop.security.SecurityInfo file instead of merging them. This causes Kerberos authentication to fail, this manifests itself in the "info:null" debug log statement: DEBUG SaslRpcClient: Get token info proto:interface org.apache.hadoop.yarn.api.ApplicationClientProtocolPB info:null DEBUG SaslRpcClient: Get kerberos info proto:interface org.apache.hadoop.yarn.api.ApplicationClientProtocolPB info:null ERROR UserGroupInformation: PriviledgedActionException as:foo@BAR (auth:KERBEROS) cause:org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] DEBUG UserGroupInformation: PrivilegedAction as:foo@BAR (auth:KERBEROS) from:org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:583) WARN Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] ERROR UserGroupInformation: PriviledgedActionException as:foo@BAR (auth:KERBEROS) cause:java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] This previously would just contain a single class: $ unzip -c assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.2.0.jar META-INF/services/org.apache.hadoop.security.SecurityInfo Archive: assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.2.0.jar inflating: META-INF/services/org.apache.hadoop.security.SecurityInfo org.apache.hadoop.security.AnnotatedSecurityInfo And now has the full list of classes: $ unzip -c assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.2.0.jar META-INF/services/org.apache.hadoop.security.SecurityInfoArchive: assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.2.0.jar inflating: META-INF/services/org.apache.hadoop.security.SecurityInfo org.apache.hadoop.security.AnnotatedSecurityInfo org.apache.hadoop.mapreduce.v2.app.MRClientSecurityInfo org.apache.hadoop.mapreduce.v2.security.client.ClientHSSecurityInfo org.apache.hadoop.yarn.security.client.ClientRMSecurityInfo org.apache.hadoop.yarn.security.ContainerManagerSecurityInfo org.apache.hadoop.yarn.security.SchedulerSecurityInfo org.apache.hadoop.yarn.security.admin.AdminSecurityInfo org.apache.hadoop.yarn.server.RMNMSecurityInfoClass
\| * \| \|	Merge pull request #137 from tgravescs/sparkYarnJarsHdfsRebase	Matei Zaharia	2013-11-12	1	-1/+2
\| \|\ \ \ \| \| \|/ / \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allow spark on yarn to be run from HDFS. Allows the spark.jar, app.jar, and log4j.properties to be put into hdfs. Allows you to specify the files on a different hdfs cluster and it will copy them over. It makes sure permissions are correct and makes sure to put things into public distributed cache so they can be reused amongst users if their permissions are appropriate. Also add a bit of error handling for missing arguments.
\| \| * \|	Add mockito to the sbt build	tgravescs	2013-11-11	1	-1/+2
\| \| \|/
\| * /	Add spark-tools assembly to spark-class classpath.	Josh Rosen	2013-11-09	1	-1/+1
\| \|/ \| \| \| \| \| \| \| \|	This allows the JavaAPICompletenessChecker to be run with Spark 0.8+.
* \|	Merge branch 'master' into scala-2.10	Raymond Liu	2013-11-13	1	-5/+30
\|\\|
\| *	Exclude jopt from kafka dependency.	Patrick Wendell	2013-10-25	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Kafka uses an older version of jopt that causes bad conflicts with the version used by spark-perf. It's not easy to remove this downstream because of the way that spark-perf uses Spark (by including a spark assembly as an unmanaged jar). This fixes the problem at its source by just never including it.
\| *	Fix Maven build to use MQTT repository	Matei Zaharia	2013-10-23	1	-3/+3
\| \|
\| *	Merge pull request #64 from prabeesh/master	Matei Zaharia	2013-10-23	1	-1/+5
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	MQTT Adapter for Spark Streaming MQTT is a machine-to-machine (M2M)/Internet of Things connectivity protocol. It was designed as an extremely lightweight publish/subscribe messaging transport. You may read more about it here http://mqtt.org/ Message Queue Telemetry Transport (MQTT) is an open message protocol for M2M communications. It enables the transfer of telemetry-style data in the form of messages from devices like sensors and actuators, to mobile phones, embedded systems on vehicles, or laptops and full scale computers. The protocol was invented by Andy Stanford-Clark of IBM, and Arlen Nipper of Cirrus Link Solutions This protocol enables a publish/subscribe messaging model in an extremely lightweight way. It is useful for connections with remote locations where line of code and network bandwidth is a constraint. MQTT is one of the widely used protocol for 'Internet of Things'. This protocol is getting much attraction as anything and everything is getting connected to internet and they all produce data. Researchers and companies predict some 25 billion devices will be connected to the internet by 2015. Plugin/Support for MQTT is available in popular MQs like RabbitMQ, ActiveMQ etc. Support for MQTT in Spark will help people with Internet of Things (IoT) projects to use Spark Streaming for their real time data processing needs (from sensors and other embedded devices etc).
\| \| *	remove unused dependency	prabeesh	2013-10-17	1	-2/+0
\| \| \|
\| \| *	added mqtt adapter library dependencies	prabeesh	2013-10-16	1	-1/+7
\| \| \|
\| * \|	Merge pull request #56 from jerryshao/kafka-0.8-dev	Matei Zaharia	2013-10-21	1	-3/+6
\| \|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Upgrade Kafka 0.7.2 to Kafka 0.8.0-beta1 for Spark Streaming Conflicts: streaming/pom.xml
\| \| * \|	Upgrade Kafka 0.7.2 to Kafka 0.8.0-beta1 for Spark Streaming	jerryshao	2013-10-12	1	-3/+6
\| \| \| \|
\| * \| \|	Merge pull request #66 from shivaram/sbt-assembly-deps	Matei Zaharia	2013-10-18	1	-3/+10
\| \|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add SBT target to assemble dependencies This pull request is an attempt to address the long assembly build times during development. Instead of rebuilding the assembly jar for every Spark change, this pull request adds a new SBT target `spark` that packages all the Spark modules and builds an assembly of the dependencies. So the work flow that should work now would be something like ``` ./sbt/sbt spark # Doing this once should suffice ## Make changes ./sbt/sbt compile ./sbt/sbt test or ./spark-shell ```
\| \| * \| \|	Rename SBT target to assemble-deps.	Shivaram Venkataraman	2013-10-16	1	-5/+5
\| \| \| \| \|
\| \| * \| \|	Merge branch 'master' of https://github.com/apache/incubator-spark into ↵	Shivaram Venkataraman	2013-10-15	2	-11/+44
\| \| \|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	sbt-assembly-deps
\| \| * \| \|	Add a comment and exclude tools	Shivaram Venkataraman	2013-10-11	1	-1/+2
\| \| \| \| \|
\| \| * \| \|	Add new SBT target for dependency assembly	Shivaram Venkataraman	2013-10-09	1	-1/+7
\| \| \| \|/ \| \| \|/\|
\| * \| \|	Fixing spark streaming example and a bug in examples build.	Patrick Wendell	2013-10-15	1	-0/+1
\| \| \|/ \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \|	- Examples assembly included a log4j.properties which clobbered Spark's - Example had an error where some classes weren't serializable - Did some other clean-up in this example
\| * \|	Merge pull request #19 from aarondav/master-zk	Matei Zaharia	2013-10-10	1	-0/+1
\| \|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Standalone Scheduler fault tolerance using ZooKeeper This patch implements full distributed fault tolerance for standalone scheduler Masters. There is only one master Leader at a time, which is actively serving scheduling requests. If this Leader crashes, another master will eventually be elected, reconstruct the state from the first Master, and continue serving scheduling requests. Leader election is performed using the ZooKeeper leader election pattern. We try to minimize the use of ZooKeeper and the assumptions about ZooKeeper's behavior, so there is a layer of retries and session monitoring on top of the ZooKeeper client. Master failover follows directly from the single-node Master recovery via the file system (patch d5a96fe), save that the Master state is stored in ZooKeeper instead. Configuration: By default, no recovery mechanism is enabled (spark.deploy.recoveryMode = NONE). By setting spark.deploy.recoveryMode to ZOOKEEPER and setting spark.deploy.zookeeper.url to an appropriate ZooKeeper URL, ZooKeeper recovery mode is enabled. By setting spark.deploy.recoveryMode to FILESYSTEM and setting spark.deploy.recoveryDirectory to an appropriate directory accessible by the Master, we will keep the behavior of from d5a96fe. Additionally, places where a Master could be specificied by a spark:// url can now take comma-delimited lists to specify backup masters. Note that this is only used for registration of NEW Workers and application Clients. Once a Worker or Client has registered with the Master Leader, it is "in the system" and will never need to register again.
\| \| * \|	Standalone Scheduler fault tolerance using ZooKeeper	Aaron Davidson	2013-09-26	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch implements full distributed fault tolerance for standalone scheduler Masters. There is only one master Leader at a time, which is actively serving scheduling requests. If this Leader crashes, another master will eventually be elected, reconstruct the state from the first Master, and continue serving scheduling requests. Leader election is performed using the ZooKeeper leader election pattern. We try to minimize the use of ZooKeeper and the assumptions about ZooKeeper's behavior, so there is a layer of retries and session monitoring on top of the ZooKeeper client. Master failover follows directly from the single-node Master recovery via the file system (patch 194ba4b8), save that the Master state is stored in ZooKeeper instead. Configuration: By default, no recovery mechanism is enabled (spark.deploy.recoveryMode = NONE). By setting spark.deploy.recoveryMode to ZOOKEEPER and setting spark.deploy.zookeeper.url to an appropriate ZooKeeper URL, ZooKeeper recovery mode is enabled. By setting spark.deploy.recoveryMode to FILESYSTEM and setting spark.deploy.recoveryDirectory to an appropriate directory accessible by the Master, we will keep the behavior of from 194ba4b8. Additionally, places where a Master could be specificied by a spark:// url can now take comma-delimited lists to specify backup masters. Note that this is only used for registration of NEW Workers and application Clients. Once a Worker or Client has registered with the Master Leader, it is "in the system" and will never need to register again. Forthcoming: Documentation, tests (! - only ad hoc testing has been performed so far) I do not intend for this commit to be merged until tests are added, but this patch should still be mostly reviewable until then.
* \| \| \|	Updating to latest akka 2.2.3, which fixes our only failing Driver Suite	Prashant Sharma	2013-10-24	1	-4/+4
\| \| \| \|
* \| \| \|	Merge branch 'scala-2.10' of github.com:ScrapCodes/spark into scala-2.10	Prashant Sharma	2013-10-10	1	-8/+14
\|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala project/SparkBuild.scala
\| * \| \| \|	Merge branch 'master' into wip-merge-master	Prashant Sharma	2013-10-08	1	-6/+8
\| \|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: bagel/pom.xml core/pom.xml core/src/test/scala/org/apache/spark/ui/UISuite.scala examples/pom.xml mllib/pom.xml pom.xml project/SparkBuild.scala repl/pom.xml streaming/pom.xml tools/pom.xml In scala 2.10, a shorter representation is used for naming artifacts so changed to shorter scala version for artifacts and made it a property in pom.
\| \| * \| \|	Merge pull request #31 from sundeepn/branch-0.8	Reynold Xin	2013-10-07	1	-5/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Resolving package conflicts with hadoop 0.23.9 Hadoop 0.23.9 is having a package conflict with easymock's dependencies. (cherry picked from commit 023e3fdf008b3194a36985a07923df9aaf64e520) Signed-off-by: Reynold Xin <rxin@apache.org>
\| * \| \| \|	Merge branch 'master' into scala-2.10	Prashant Sharma	2013-10-05	1	-0/+3
\| \|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/test/scala/org/apache/spark/DistributedSuite.scala project/SparkBuild.scala
\| \| * \| \|	ask ivy/sbt to check local maven repo under ~/.m2	Du Li	2013-10-01	1	-0/+3
\| \| \| \| \|
\| * \| \| \|	Merge branch 'master' into scala-2.10	Prashant Sharma	2013-10-01	1	-2/+3
\| \|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/ui/jobs/JobProgressUI.scala docs/_config.yml project/SparkBuild.scala repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala
\| \| * \| \|	Removed scala -optimize flag.	Reynold Xin	2013-09-26	1	-1/+1
\| \| \|/ /
\| \| * \|	Merge pull request #930 from holdenk/master	Reynold Xin	2013-09-26	1	-1/+1
\| \| \|\ \ \| \| \| \| \| \| \| \| \| \|	Add mapPartitionsWithIndex
\| \| \| * \|	Fix build on ubuntu	Holden Karau	2013-09-14	1	-1/+1
\| \| \| \| \|
\| \| * \| \|	Update build version in master	Patrick Wendell	2013-09-24	1	-1/+1
\| \| \| \| \|
* \| \| \| \|	scala 2.10 requires Java 1.6,	Martin Weindel	2013-10-05	1	-3/+3
\|/ / / / \| \| \| \| \| \| \| \| \| \| \| \|	using Scala 2.10.3, resolved maven-scala-plugin warning
* \| \| \|	Sync with master and some build fixes	Prashant Sharma	2013-09-26	1	-8/+8
\|\\| \| \|
\| * \| \|	Bumping Mesos version to 0.13.0	Patrick Wendell	2013-09-15	1	-1/+1
\| \|/ /
* \| \|	fixed maven build for scala 2.10	Prashant Sharma	2013-09-26	1	-2/+1
\| \| \|
* \| \|	Akka 2.2 migration	Prashant Sharma	2013-09-22	1	-9/+9
\| \| \|
* \| \|	Merge branch 'master' of git://github.com/mesos/spark into scala-2.10	Prashant Sharma	2013-09-15	2	-6/+31
\|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/SparkContext.scala project/SparkBuild.scala