spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	Use published "org.spark-project.akka-*" in sbt build for Hadoop-2.2 ↵	Harvey Feng	2013-12-03	1	-13/+15
\| \| \| \| \| \| \| \| \|	dependencies. This also includes: -Change `isNewYarn` to `isNewHadoop`, since the protobuf-2.5 dependency is from Hadoop-2.2 itself. -Regexp bugix Credits to @alig for this patch.
*	Merge remote-tracking branch 'origin/master' into yarn-2.2	Harvey Feng	2013-11-26	1	-0/+1
\|\ \| \| \| \| \| \| \| \|	Conflicts: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
\| *	Merge pull request #151 from russellcardullo/add-graphite-sink	Matei Zaharia	2013-11-24	1	-0/+1
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add graphite sink for metrics This adds a metrics sink for graphite. The sink must be configured with the host and port of a graphite node and optionally may be configured with a prefix that will be prepended to all metrics that are sent to graphite.
\| \| *	Add graphite sink for metrics	Russell Cardullo	2013-11-08	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds a metrics sink for graphite. The sink must be configured with the host and port of a graphite node and optionally may be configured with a prefix that will be prepended to all metrics that are sent to graphite.
* \| \|	Add optional Hadoop 2.2 settings in sbt build.	Harvey Feng	2013-11-26	1	-9/+23
\|/ / \| \| \| \| \| \| \| \|	If the Hadoop used is version 2.2 or derived from it, then Spark will be compiled against protobuf-2.5 and a protobuf-2.5 version of Akka 2.0.5.
* \|	Merge pull request #165 from NathanHowell/kerberos-master	Matei Zaharia	2013-11-13	2	-2/+2
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	spark-assembly.jar fails to authenticate with YARN ResourceManager The META-INF/services/ sbt MergeStrategy was discarding support for Kerberos, among others. This pull request changes to a merge strategy similar to sbt-assembly's default. I've also included an update to sbt-assembly 0.9.2, a minor fix to it's zip file handling.
\| * \|	Upgrade to sbt-assembly 0.9.2	Nathan Howell	2013-11-12	1	-1/+1
\| \| \|
\| * \|	spark-assembly.jar fails to authenticate with YARN ResourceManager	Nathan Howell	2013-11-12	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	sbt-assembly is setup to pick the first META-INF/services/org.apache.hadoop.security.SecurityInfo file instead of merging them. This causes Kerberos authentication to fail, this manifests itself in the "info:null" debug log statement: DEBUG SaslRpcClient: Get token info proto:interface org.apache.hadoop.yarn.api.ApplicationClientProtocolPB info:null DEBUG SaslRpcClient: Get kerberos info proto:interface org.apache.hadoop.yarn.api.ApplicationClientProtocolPB info:null ERROR UserGroupInformation: PriviledgedActionException as:foo@BAR (auth:KERBEROS) cause:org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] DEBUG UserGroupInformation: PrivilegedAction as:foo@BAR (auth:KERBEROS) from:org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:583) WARN Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] ERROR UserGroupInformation: PriviledgedActionException as:foo@BAR (auth:KERBEROS) cause:java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] This previously would just contain a single class: $ unzip -c assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.2.0.jar META-INF/services/org.apache.hadoop.security.SecurityInfo Archive: assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.2.0.jar inflating: META-INF/services/org.apache.hadoop.security.SecurityInfo org.apache.hadoop.security.AnnotatedSecurityInfo And now has the full list of classes: $ unzip -c assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.2.0.jar META-INF/services/org.apache.hadoop.security.SecurityInfoArchive: assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.2.0.jar inflating: META-INF/services/org.apache.hadoop.security.SecurityInfo org.apache.hadoop.security.AnnotatedSecurityInfo org.apache.hadoop.mapreduce.v2.app.MRClientSecurityInfo org.apache.hadoop.mapreduce.v2.security.client.ClientHSSecurityInfo org.apache.hadoop.yarn.security.client.ClientRMSecurityInfo org.apache.hadoop.yarn.security.ContainerManagerSecurityInfo org.apache.hadoop.yarn.security.SchedulerSecurityInfo org.apache.hadoop.yarn.security.admin.AdminSecurityInfo org.apache.hadoop.yarn.server.RMNMSecurityInfoClass
* \| \|	Merge pull request #137 from tgravescs/sparkYarnJarsHdfsRebase	Matei Zaharia	2013-11-12	1	-1/+2
\|\ \ \ \| \|/ / \|/\| \| \| \| \| \| \| \| \| \| \|	Allow spark on yarn to be run from HDFS. Allows the spark.jar, app.jar, and log4j.properties to be put into hdfs. Allows you to specify the files on a different hdfs cluster and it will copy them over. It makes sure permissions are correct and makes sure to put things into public distributed cache so they can be reused amongst users if their permissions are appropriate. Also add a bit of error handling for missing arguments.
\| * \|	Add mockito to the sbt build	tgravescs	2013-11-11	1	-1/+2
\| \|/
* /	Add spark-tools assembly to spark-class classpath.	Josh Rosen	2013-11-09	1	-1/+1
\|/ \| \| \| \|	This allows the JavaAPICompletenessChecker to be run with Spark 0.8+.
*	Exclude jopt from kafka dependency.	Patrick Wendell	2013-10-25	1	-0/+1
\| \| \| \| \| \| \|	Kafka uses an older version of jopt that causes bad conflicts with the version used by spark-perf. It's not easy to remove this downstream because of the way that spark-perf uses Spark (by including a spark assembly as an unmanaged jar). This fixes the problem at its source by just never including it.
*	Fix Maven build to use MQTT repository	Matei Zaharia	2013-10-23	1	-3/+3
\|
*	Merge pull request #64 from prabeesh/master	Matei Zaharia	2013-10-23	1	-1/+5
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	MQTT Adapter for Spark Streaming MQTT is a machine-to-machine (M2M)/Internet of Things connectivity protocol. It was designed as an extremely lightweight publish/subscribe messaging transport. You may read more about it here http://mqtt.org/ Message Queue Telemetry Transport (MQTT) is an open message protocol for M2M communications. It enables the transfer of telemetry-style data in the form of messages from devices like sensors and actuators, to mobile phones, embedded systems on vehicles, or laptops and full scale computers. The protocol was invented by Andy Stanford-Clark of IBM, and Arlen Nipper of Cirrus Link Solutions This protocol enables a publish/subscribe messaging model in an extremely lightweight way. It is useful for connections with remote locations where line of code and network bandwidth is a constraint. MQTT is one of the widely used protocol for 'Internet of Things'. This protocol is getting much attraction as anything and everything is getting connected to internet and they all produce data. Researchers and companies predict some 25 billion devices will be connected to the internet by 2015. Plugin/Support for MQTT is available in popular MQs like RabbitMQ, ActiveMQ etc. Support for MQTT in Spark will help people with Internet of Things (IoT) projects to use Spark Streaming for their real time data processing needs (from sensors and other embedded devices etc).
\| *	remove unused dependency	prabeesh	2013-10-17	1	-2/+0
\| \|
\| *	added mqtt adapter library dependencies	prabeesh	2013-10-16	1	-1/+7
\| \|
* \|	Merge pull request #56 from jerryshao/kafka-0.8-dev	Matei Zaharia	2013-10-21	1	-3/+6
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Upgrade Kafka 0.7.2 to Kafka 0.8.0-beta1 for Spark Streaming Conflicts: streaming/pom.xml
\| * \|	Upgrade Kafka 0.7.2 to Kafka 0.8.0-beta1 for Spark Streaming	jerryshao	2013-10-12	1	-3/+6
\| \| \|
* \| \|	Merge pull request #66 from shivaram/sbt-assembly-deps	Matei Zaharia	2013-10-18	1	-3/+10
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add SBT target to assemble dependencies This pull request is an attempt to address the long assembly build times during development. Instead of rebuilding the assembly jar for every Spark change, this pull request adds a new SBT target `spark` that packages all the Spark modules and builds an assembly of the dependencies. So the work flow that should work now would be something like ``` ./sbt/sbt spark # Doing this once should suffice ## Make changes ./sbt/sbt compile ./sbt/sbt test or ./spark-shell ```
\| * \| \|	Rename SBT target to assemble-deps.	Shivaram Venkataraman	2013-10-16	1	-5/+5
\| \| \| \|
\| * \| \|	Merge branch 'master' of https://github.com/apache/incubator-spark into ↵	Shivaram Venkataraman	2013-10-15	2	-11/+44
\| \|\\| \| \| \| \| \| \| \| \| \| \| \| \| \|	sbt-assembly-deps
\| * \| \|	Add a comment and exclude tools	Shivaram Venkataraman	2013-10-11	1	-1/+2
\| \| \| \|
\| * \| \|	Add new SBT target for dependency assembly	Shivaram Venkataraman	2013-10-09	1	-1/+7
\| \| \|/ \| \|/\|
* \| \|	Fixing spark streaming example and a bug in examples build.	Patrick Wendell	2013-10-15	1	-0/+1
\| \|/ \|/\| \| \| \| \| \| \| \| \|	- Examples assembly included a log4j.properties which clobbered Spark's - Example had an error where some classes weren't serializable - Did some other clean-up in this example
* \|	Merge pull request #19 from aarondav/master-zk	Matei Zaharia	2013-10-10	1	-0/+1
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Standalone Scheduler fault tolerance using ZooKeeper This patch implements full distributed fault tolerance for standalone scheduler Masters. There is only one master Leader at a time, which is actively serving scheduling requests. If this Leader crashes, another master will eventually be elected, reconstruct the state from the first Master, and continue serving scheduling requests. Leader election is performed using the ZooKeeper leader election pattern. We try to minimize the use of ZooKeeper and the assumptions about ZooKeeper's behavior, so there is a layer of retries and session monitoring on top of the ZooKeeper client. Master failover follows directly from the single-node Master recovery via the file system (patch d5a96fe), save that the Master state is stored in ZooKeeper instead. Configuration: By default, no recovery mechanism is enabled (spark.deploy.recoveryMode = NONE). By setting spark.deploy.recoveryMode to ZOOKEEPER and setting spark.deploy.zookeeper.url to an appropriate ZooKeeper URL, ZooKeeper recovery mode is enabled. By setting spark.deploy.recoveryMode to FILESYSTEM and setting spark.deploy.recoveryDirectory to an appropriate directory accessible by the Master, we will keep the behavior of from d5a96fe. Additionally, places where a Master could be specificied by a spark:// url can now take comma-delimited lists to specify backup masters. Note that this is only used for registration of NEW Workers and application Clients. Once a Worker or Client has registered with the Master Leader, it is "in the system" and will never need to register again.
\| * \|	Standalone Scheduler fault tolerance using ZooKeeper	Aaron Davidson	2013-09-26	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch implements full distributed fault tolerance for standalone scheduler Masters. There is only one master Leader at a time, which is actively serving scheduling requests. If this Leader crashes, another master will eventually be elected, reconstruct the state from the first Master, and continue serving scheduling requests. Leader election is performed using the ZooKeeper leader election pattern. We try to minimize the use of ZooKeeper and the assumptions about ZooKeeper's behavior, so there is a layer of retries and session monitoring on top of the ZooKeeper client. Master failover follows directly from the single-node Master recovery via the file system (patch 194ba4b8), save that the Master state is stored in ZooKeeper instead. Configuration: By default, no recovery mechanism is enabled (spark.deploy.recoveryMode = NONE). By setting spark.deploy.recoveryMode to ZOOKEEPER and setting spark.deploy.zookeeper.url to an appropriate ZooKeeper URL, ZooKeeper recovery mode is enabled. By setting spark.deploy.recoveryMode to FILESYSTEM and setting spark.deploy.recoveryDirectory to an appropriate directory accessible by the Master, we will keep the behavior of from 194ba4b8. Additionally, places where a Master could be specificied by a spark:// url can now take comma-delimited lists to specify backup masters. Note that this is only used for registration of NEW Workers and application Clients. Once a Worker or Client has registered with the Master Leader, it is "in the system" and will never need to register again. Forthcoming: Documentation, tests (! - only ad hoc testing has been performed so far) I do not intend for this commit to be merged until tests are added, but this patch should still be mostly reviewable until then.
* \| \|	Merge pull request #31 from sundeepn/branch-0.8	Reynold Xin	2013-10-07	1	-5/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Resolving package conflicts with hadoop 0.23.9 Hadoop 0.23.9 is having a package conflict with easymock's dependencies. (cherry picked from commit 023e3fdf008b3194a36985a07923df9aaf64e520) Signed-off-by: Reynold Xin <rxin@apache.org>
* \| \|	ask ivy/sbt to check local maven repo under ~/.m2	Du Li	2013-10-01	1	-0/+3
\| \| \|
* \| \|	Removed scala -optimize flag.	Reynold Xin	2013-09-26	1	-1/+1
\|/ /
* \|	Merge pull request #930 from holdenk/master	Reynold Xin	2013-09-26	1	-1/+1
\|\ \ \| \| \| \| \| \|	Add mapPartitionsWithIndex
\| * \|	Fix build on ubuntu	Holden Karau	2013-09-14	1	-1/+1
\| \| \|
* \| \|	Update build version in master	Patrick Wendell	2013-09-24	1	-1/+1
\| \| \|
* \| \|	Bumping Mesos version to 0.13.0	Patrick Wendell	2013-09-15	1	-1/+1
\|/ /
* \|	Merge pull request #919 from mateiz/jets3t	Patrick Wendell	2013-09-11	1	-0/+1
\|\ \ \| \| \| \| \| \|	Add explicit jets3t dependency, which is excluded in hadoop-client
\| * \|	Add explicit jets3t dependency, which is excluded in hadoop-client	Matei Zaharia	2013-09-10	1	-0/+1
\| \| \|
* \| \|	Fix HDFS access bug with assembly build.	Patrick Wendell	2013-09-10	1	-0/+1
\|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Due to this change in HDFS: https://issues.apache.org/jira/browse/HADOOP-7549 there is a bug when using the new assembly builds. The symptom is that any HDFS access results in an exception saying "No filesystem for scheme 'hdfs'". This adds a merge strategy in the assembly build which fixes the problem.
* \|	Merge pull request #906 from pwendell/ganglia-sink	Patrick Wendell	2013-09-08	1	-0/+1
\|\ \ \| \| \| \| \| \|	Clean-up of Metrics Code/Docs and Add Ganglia Sink
\| * \|	Ganglia sink	Patrick Wendell	2013-09-08	1	-0/+1
\| \| \|
* \| \|	Merge pull request #908 from pwendell/master	Matei Zaharia	2013-09-08	1	-1/+7
\|\ \ \ \| \| \| \| \| \| \| \|	Fix target JVM version in scala build
\| * \| \|	Fix target JVM version in scala build	Patrick Wendell	2013-09-08	1	-1/+7
\| \|/ /
* \| \|	Merge pull request #904 from pwendell/master	Patrick Wendell	2013-09-07	1	-1/+18
\|\\| \| \| \| \| \| \| \|	Adding Apache license to two files
\| * \|	Adding Apache license to two files	Patrick Wendell	2013-09-07	1	-1/+18
\| \|/
* /	Minor YARN build cleanups	Jey Kottalam	2013-09-06	1	-2/+2
\|/
*	Add Apache parent POM	Matei Zaharia	2013-09-02	1	-0/+5
\|
*	Fix some URLs	Matei Zaharia	2013-09-01	1	-2/+2
\|
*	Initial work to rename package to org.apache.spark	Matei Zaharia	2013-09-01	1	-4/+8
\|
*	Update Maven build to create assemblies expected by new scripts	Matei Zaharia	2013-08-29	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This includes the following changes: - The "assembly" package now builds in Maven by default, and creates an assembly containing both hadoop-client and Spark, unlike the old BigTop distribution assembly that skipped hadoop-client - There is now a bigtop-dist package to build the old BigTop assembly - The repl-bin package is no longer built by default since the scripts don't reply on it; instead it can be enabled with -Prepl-bin - Py4J is now included in the assembly/lib folder as a local Maven repo, so that the Maven package can link to it - run-example now adds the original Spark classpath as well because the Maven examples assembly lists spark-core and such as provided - The various Maven projects add a spark-yarn dependency correctly
*	Provide more memory for tests	Matei Zaharia	2013-08-29	1	-1/+1
\|
*	Change build and run instructions to use assemblies	Matei Zaharia	2013-08-29	3	-22/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit makes Spark invocation saner by using an assembly JAR to find all of Spark's dependencies instead of adding all the JARs in lib_managed. It also packages the examples into an assembly and uses that as SPARK_EXAMPLES_JAR. Finally, it replaces the old "run" script with two better-named scripts: "run-examples" for examples, and "spark-class" for Spark internal classes (e.g. REPL, master, etc). This is also designed to minimize the confusion people have in trying to use "run" to run their own classes; it's not meant to do that, but now at least if they look at it, they can modify run-examples to do a decent job for them. As part of this, Bagel's examples are also now properly moved to the examples package instead of bagel.
*	Revert "Merge pull request #841 from rxin/json"	Reynold Xin	2013-08-26	1	-0/+1
\| \| \| \| \|	This reverts commit 1fb1b0992838c8cdd57eec45793e67a0490f1a52, reversing changes made to c69c48947d5102c81a9425cb380d861c3903685c.