spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge pull request #340 from ScrapCodes/sbt-fixes	Patrick Wendell	2014-01-06	1	-5/+3
\|\ \| \| \| \| \| \|	Made java options to be applied during tests so that they become self explanatory.
\| *	Made java options to be applied during tests so that they become self ↵	Prashant Sharma	2014-01-06	1	-5/+3
\| \| \| \| \| \| \| \|	explanatory.
* \|	SPARK-1005 Ning upgrade	Prashant Sharma	2014-01-06	1	-1/+1
\|/
*	Merge remote-tracking branch 'apache-github/master' into remove-binaries	Patrick Wendell	2014-01-03	1	-7/+25
\|\ \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/test/scala/org/apache/spark/DriverSuite.scala docs/python-programming-guide.md
\| *	Using name yarn-alpha/yarn instead of yarn-2.0/yarn-2.2	Raymond Liu	2014-01-03	1	-8/+8
\| \|
\| *	Add yarn/common/src/test dir in building script	Raymond Liu	2014-01-03	1	-0/+7
\| \|
\| *	Use unmanaged source dir to include common yarn code	Raymond Liu	2014-01-03	1	-11/+15
\| \|
\| *	Reorganize yarn related codes into sub projects to remove duplicate files.	Raymond Liu	2014-01-03	1	-8/+15
\| \|
* \|	Changes on top of Prashant's patch.	Patrick Wendell	2014-01-03	1	-0/+1
\| \| \| \| \| \| \| \|	Closes #316
* \|	fixed review comments	Prashant Sharma	2014-01-03	1	-5/+9
\| \|
* \|	Merge branch 'master' into spark-1002-remove-jars	Prashant Sharma	2014-01-03	1	-0/+1
\|\\|
\| *	Merge remote-tracking branch 'apache/master' into conf2	Matei Zaharia	2014-01-01	1	-1/+2
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: project/SparkBuild.scala
\| * \	Merge remote-tracking branch 'apache/master' into conf2	Matei Zaharia	2013-12-31	1	-1/+1
\| \|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/rdd/CheckpointRDD.scala streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala
\| * \ \	Merge remote-tracking branch 'origin/master' into conf2	Matei Zaharia	2013-12-29	1	-1/+4
\| \|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/SparkContext.scala core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala core/src/main/scala/org/apache/spark/scheduler/cluster/ClusterTaskSetManager.scala core/src/main/scala/org/apache/spark/scheduler/local/LocalScheduler.scala core/src/main/scala/org/apache/spark/util/MetadataCleaner.scala core/src/test/scala/org/apache/spark/scheduler/TaskResultGetterSuite.scala core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala new-yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala streaming/src/test/scala/org/apache/spark/streaming/BasicOperationsSuite.scala streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala streaming/src/test/scala/org/apache/spark/streaming/TestSuiteBase.scala streaming/src/test/scala/org/apache/spark/streaming/WindowOperationsSuite.scala
\| * \| \| \|	spark-544, introducing SparkConf and related configuration overhaul.	Prashant Sharma	2013-12-25	1	-1/+2
\| \| \| \| \|
* \| \| \| \|	Deleted py4j jar and added to assembly dependency	Prashant Sharma	2014-01-02	1	-0/+1
\| \|_\|_\|/ \|/\| \| \|
* \| \| \|	Merge pull request #73 from falaki/ApproximateDistinctCount	Reynold Xin	2013-12-31	1	-1/+2
\|\ \ \ \ \| \|_\|_\|/ \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Approximate distinct count Added countApproxDistinct() to RDD and countApproxDistinctByKey() to PairRDDFunctions to approximately count distinct number of elements and distinct number of values per key, respectively. Both functions use HyperLogLog from stream-lib for counting. Both functions take a parameter that controls the trade-off between accuracy and memory consumption. Also added Scala docs and test suites for both methods.
\| * \| \|	Added stream 2.5.1 jar depenency	Hossein Falaki	2013-12-30	1	-1/+2
\| \| \|/ \| \|/\|
* / \|	upgrade Netty from 4.0.0.Beta2 to 4.0.13.Final	Binh Nguyen	2013-12-24	1	-1/+1
\|/ /
* /	Show full stack trace and time taken in unit tests.	Reynold Xin	2013-12-23	1	-1/+4
\|/
*	[SPARK-959] Explicitly depend on org.eclipse.jetty.orbit jar	Aaron Davidson	2013-12-18	1	-0/+2
\| \| \| \| \| \| \| \|	Without this, in some cases, Ivy attempts to download the wrong file and fails, stopping the whole build. See bug for more details. (This is probably also the beginning of the slow death of our recently prettified dependencies. Form follow function.)
*	Attempt with extra repositories	Patrick Wendell	2013-12-16	1	-22/+10
\|
*	Review comments on the PR for scala 2.10 migration.	Prashant Sharma	2013-12-13	1	-3/+3
\|
*	Disabled yarn 2.2 and added a message in the sbt build	Prashant Sharma	2013-12-12	1	-7/+17
\|
*	Merge branch 'akka-bug-fix' of github.com:ScrapCodes/incubator-spark into ↵	Prashant Sharma	2013-12-11	1	-1/+1
\|\ \| \| \| \| \| \|	akka-bug-fix
\| *	added eclipse repository for spark streaming.	Prashant Sharma	2013-12-11	1	-1/+1
\| \|
* \|	Merge branch 'master' into akka-bug-fix	Prashant Sharma	2013-12-11	1	-7/+23
\|\ \ \| \|/ \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: core/pom.xml core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala pom.xml project/SparkBuild.scala streaming/pom.xml yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala
\| *	Use published "org.spark-project.akka-*" in sbt build for Hadoop-2.2 ↵	Harvey Feng	2013-12-03	1	-13/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	dependencies. This also includes: -Change `isNewYarn` to `isNewHadoop`, since the protobuf-2.5 dependency is from Hadoop-2.2 itself. -Regexp bugix Credits to @alig for this patch.
\| *	Merge remote-tracking branch 'origin/master' into yarn-2.2	Harvey Feng	2013-11-26	1	-0/+1
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
\| * \|	Add optional Hadoop 2.2 settings in sbt build.	Harvey Feng	2013-11-26	1	-9/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the Hadoop used is version 2.2 or derived from it, then Spark will be compiled against protobuf-2.5 and a protobuf-2.5 version of Akka 2.0.5.
* \| \|	Merge branch 'master' into scala-2.10-wip	Prashant Sharma	2013-11-25	1	-1/+2
\|\ \ \ \| \| \|/ \| \|/\| \| \| \| \| \| \| \| \| \|	Conflicts: core/src/main/scala/org/apache/spark/rdd/RDD.scala project/SparkBuild.scala
\| * \|	Merge pull request #151 from russellcardullo/add-graphite-sink	Matei Zaharia	2013-11-24	1	-0/+1
\| \|\ \ \| \| \|/ \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add graphite sink for metrics This adds a metrics sink for graphite. The sink must be configured with the host and port of a graphite node and optionally may be configured with a prefix that will be prepended to all metrics that are sent to graphite.
\| \| *	Add graphite sink for metrics	Russell Cardullo	2013-11-08	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds a metrics sink for graphite. The sink must be configured with the host and port of a graphite node and optionally may be configured with a prefix that will be prepended to all metrics that are sent to graphite.
* \| \|	Use Kafka 2.10 (again)	Aaron Davidson	2013-11-14	1	-2/+3
\| \| \|
* \| \|	Various merge corrections	Aaron Davidson	2013-11-14	1	-9/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I've diff'd this patch against my own -- since they were both created independently, this means that two sets of eyes have gone over all the merge conflicts that were created, so I'm feeling significantly more confident in the resulting PR. @rxin has looked at the changes to the repl and is resoundingly confident that they are correct.
* \| \|	Some fixes for previous master merge commits	Raymond Liu	2013-11-15	1	-0/+1
\| \| \|
* \| \|	Merge branch 'master' into scala-2.10	Raymond Liu	2013-11-14	2	-3/+4
\|\\| \|
\| * \|	Merge pull request #165 from NathanHowell/kerberos-master	Matei Zaharia	2013-11-13	2	-2/+2
\| \|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	spark-assembly.jar fails to authenticate with YARN ResourceManager The META-INF/services/ sbt MergeStrategy was discarding support for Kerberos, among others. This pull request changes to a merge strategy similar to sbt-assembly's default. I've also included an update to sbt-assembly 0.9.2, a minor fix to it's zip file handling.
\| \| * \|	Upgrade to sbt-assembly 0.9.2	Nathan Howell	2013-11-12	1	-1/+1
\| \| \| \|
\| \| * \|	spark-assembly.jar fails to authenticate with YARN ResourceManager	Nathan Howell	2013-11-12	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	sbt-assembly is setup to pick the first META-INF/services/org.apache.hadoop.security.SecurityInfo file instead of merging them. This causes Kerberos authentication to fail, this manifests itself in the "info:null" debug log statement: DEBUG SaslRpcClient: Get token info proto:interface org.apache.hadoop.yarn.api.ApplicationClientProtocolPB info:null DEBUG SaslRpcClient: Get kerberos info proto:interface org.apache.hadoop.yarn.api.ApplicationClientProtocolPB info:null ERROR UserGroupInformation: PriviledgedActionException as:foo@BAR (auth:KERBEROS) cause:org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] DEBUG UserGroupInformation: PrivilegedAction as:foo@BAR (auth:KERBEROS) from:org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:583) WARN Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] ERROR UserGroupInformation: PriviledgedActionException as:foo@BAR (auth:KERBEROS) cause:java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS] This previously would just contain a single class: $ unzip -c assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.2.0.jar META-INF/services/org.apache.hadoop.security.SecurityInfo Archive: assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.2.0.jar inflating: META-INF/services/org.apache.hadoop.security.SecurityInfo org.apache.hadoop.security.AnnotatedSecurityInfo And now has the full list of classes: $ unzip -c assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.2.0.jar META-INF/services/org.apache.hadoop.security.SecurityInfoArchive: assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-SNAPSHOT-hadoop2.2.0.jar inflating: META-INF/services/org.apache.hadoop.security.SecurityInfo org.apache.hadoop.security.AnnotatedSecurityInfo org.apache.hadoop.mapreduce.v2.app.MRClientSecurityInfo org.apache.hadoop.mapreduce.v2.security.client.ClientHSSecurityInfo org.apache.hadoop.yarn.security.client.ClientRMSecurityInfo org.apache.hadoop.yarn.security.ContainerManagerSecurityInfo org.apache.hadoop.yarn.security.SchedulerSecurityInfo org.apache.hadoop.yarn.security.admin.AdminSecurityInfo org.apache.hadoop.yarn.server.RMNMSecurityInfoClass
\| * \| \|	Merge pull request #137 from tgravescs/sparkYarnJarsHdfsRebase	Matei Zaharia	2013-11-12	1	-1/+2
\| \|\ \ \ \| \| \|/ / \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allow spark on yarn to be run from HDFS. Allows the spark.jar, app.jar, and log4j.properties to be put into hdfs. Allows you to specify the files on a different hdfs cluster and it will copy them over. It makes sure permissions are correct and makes sure to put things into public distributed cache so they can be reused amongst users if their permissions are appropriate. Also add a bit of error handling for missing arguments.
\| \| * \|	Add mockito to the sbt build	tgravescs	2013-11-11	1	-1/+2
\| \| \|/
\| * /	Add spark-tools assembly to spark-class classpath.	Josh Rosen	2013-11-09	1	-1/+1
\| \|/ \| \| \| \| \| \| \| \|	This allows the JavaAPICompletenessChecker to be run with Spark 0.8+.
* \|	Merge branch 'master' into scala-2.10	Raymond Liu	2013-11-13	1	-5/+30
\|\\|
\| *	Exclude jopt from kafka dependency.	Patrick Wendell	2013-10-25	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Kafka uses an older version of jopt that causes bad conflicts with the version used by spark-perf. It's not easy to remove this downstream because of the way that spark-perf uses Spark (by including a spark assembly as an unmanaged jar). This fixes the problem at its source by just never including it.
\| *	Fix Maven build to use MQTT repository	Matei Zaharia	2013-10-23	1	-3/+3
\| \|
\| *	Merge pull request #64 from prabeesh/master	Matei Zaharia	2013-10-23	1	-1/+5
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	MQTT Adapter for Spark Streaming MQTT is a machine-to-machine (M2M)/Internet of Things connectivity protocol. It was designed as an extremely lightweight publish/subscribe messaging transport. You may read more about it here http://mqtt.org/ Message Queue Telemetry Transport (MQTT) is an open message protocol for M2M communications. It enables the transfer of telemetry-style data in the form of messages from devices like sensors and actuators, to mobile phones, embedded systems on vehicles, or laptops and full scale computers. The protocol was invented by Andy Stanford-Clark of IBM, and Arlen Nipper of Cirrus Link Solutions This protocol enables a publish/subscribe messaging model in an extremely lightweight way. It is useful for connections with remote locations where line of code and network bandwidth is a constraint. MQTT is one of the widely used protocol for 'Internet of Things'. This protocol is getting much attraction as anything and everything is getting connected to internet and they all produce data. Researchers and companies predict some 25 billion devices will be connected to the internet by 2015. Plugin/Support for MQTT is available in popular MQs like RabbitMQ, ActiveMQ etc. Support for MQTT in Spark will help people with Internet of Things (IoT) projects to use Spark Streaming for their real time data processing needs (from sensors and other embedded devices etc).
\| \| *	remove unused dependency	prabeesh	2013-10-17	1	-2/+0
\| \| \|
\| \| *	added mqtt adapter library dependencies	prabeesh	2013-10-16	1	-1/+7
\| \| \|
\| * \|	Merge pull request #56 from jerryshao/kafka-0.8-dev	Matei Zaharia	2013-10-21	1	-3/+6
\| \|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Upgrade Kafka 0.7.2 to Kafka 0.8.0-beta1 for Spark Streaming Conflicts: streaming/pom.xml