spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
...
\| \| * \| \|	Minor typo fix for yarn client	Raymond Liu	2014-01-07	2	-2/+2
\| \| \| \| \|
\| * \| \| \|	Merge pull request #322 from falaki/MLLibDocumentationImprovement	Patrick Wendell	2014-01-07	1	-56/+274
\| \|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SPARK-1009 Updated MLlib docs to show how to use it in Python In addition added detailed examples for regression, clustering and recommendation algorithms in a separate Scala section. Fixed a few minor issues with existing documentation.
\| \| * \ \ \	Fixed merge conflict	Hossein Falaki	2014-01-07	306	-3327/+4228
\| \| \|\ \ \ \
\| \| * \| \| \| \|	Added proper evaluation example for collaborative filtering and fixed typo	Hossein Falaki	2014-01-06	1	-4/+8
\| \| \| \| \| \| \|
\| \| * \| \| \| \|	Added table of contents and minor fixes	Hossein Falaki	2014-01-03	1	-8/+16
\| \| \| \| \| \| \|
\| \| * \| \| \| \|	Commented the last part of collaborative filtering examples that lead to errors	Hossein Falaki	2014-01-02	1	-5/+6
\| \| \| \| \| \| \|
\| \| * \| \| \| \|	Added Scala and Python examples for mllib	Hossein Falaki	2014-01-02	1	-52/+261
\| \| \| \| \| \| \|
\| * \| \| \| \| \|	Merge pull request #355 from ScrapCodes/patch-1	Patrick Wendell	2014-01-07	1	-1/+1
\| \|\ \ \ \ \ \ \| \| \|_\|_\|_\|_\|/ \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Update README.md The link does not work otherwise.
\| \| * \| \| \| \|	Update README.md	Prashant Sharma	2014-01-08	1	-1/+1
\| \| \| \|_\|_\|/ \| \| \|/\| \| \| \| \| \| \| \| \|	The link does not work otherwise.
\| * \| \| \| \|	Merge pull request #313 from tdas/project-refactor	Patrick Wendell	2014-01-07	51	-739/+1907
\| \|\ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Refactored the streaming project to separate external libraries like Twitter, Kafka, Flume, etc. At a high level, these are the following changes. 1. All the external code was put in `SPARK_HOME/external/` as separate SBT projects and Maven modules. Their artifact names are `spark-streaming-twitter`, `spark-streaming-kafka`, etc. Both SparkBuild.scala and pom.xml files have been updated. References to external libraries and repositories have been removed from the settings of root and streaming projects/modules. 2. To avail the external functionality (say, creating a Twitter stream), the developer has to `import org.apache.spark.streaming.twitter._` . For Scala API, the developer has to call `TwitterUtils.createStream(streamingContext, ...)`. For the Java API, the developer has to call `TwitterUtils.createStream(javaStreamingContext, ...)`. 3. Each external project has its own scala and java unit tests. Note the unit tests of each external library use classes of the streaming unit tests (`TestSuiteBase`, `LocalJavaStreamingContext`, etc.). To enable this code sharing among test classes, `dependsOn(streaming % "compile->compile,test->test")` was used in the SparkBuild.scala . In the streaming/pom.xml, an additional `maven-jar-plugin` was necessary to capture this dependency (see comment inside the pom.xml for more information). 4. Jars of the external projects have been added to examples project but not to the assembly project. 5. In some files, imports have been rearrange to conform to the Spark coding guidelines.
\| \| * \| \| \| \|	Fixed examples/pom.xml and run-example based on Patrick's suggestions.	Tathagata Das	2014-01-07	2	-12/+2
\| \| \| \| \| \| \|
\| \| * \| \| \| \|	Removed XYZFunctions and added XYZUtils as a common Scala and Java interface ↵	Tathagata Das	2014-01-07	35	-646/+383
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	for creating XYZ streams.
\| \| * \| \| \| \|	Merge remote-tracking branch 'apache/master' into project-refactor	Tathagata Das	2014-01-06	302	-3381/+3981
\| \| \|\ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: examples/src/main/java/org/apache/spark/streaming/examples/JavaFlumeEventCount.java streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala streaming/src/test/java/org/apache/spark/streaming/JavaAPISuite.java streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala streaming/src/test/scala/org/apache/spark/streaming/TestSuiteBase.scala
\| \| * \| \| \| \| \|	Changed JavaStreamingContextWith* to Function in streaming.api.java.** ↵	Tathagata Das	2014-01-06	15	-76/+67
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	package. Also fixed packages of Flume and MQTT tests.
\| \| * \| \| \| \| \|	Merge branch 'apache-master' into project-refactor	Tathagata Das	2013-12-31	14	-67/+52
\| \| \|\ \ \ \ \ \
\| \| * \| \| \| \| \| \|	Removed extra empty lines.	Tathagata Das	2013-12-31	3	-3/+0
\| \| \| \| \| \| \| \| \|
\| \| * \| \| \| \| \| \|	Removed unnecessary comments.	Tathagata Das	2013-12-31	3	-55/+8
\| \| \| \| \| \| \| \| \|
\| \| * \| \| \| \| \| \|	Added pom.xml for external projects and removed unnecessary dependencies and ↵	Tathagata Das	2013-12-31	9	-106/+548
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	repositoris from other poms and sbt.
\| \| * \| \| \| \| \| \|	Refactored kafka, flume, zeromq, mqtt as separate external projects, with ↵	Tathagata Das	2013-12-30	50	-599/+1612
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	their own self-contained scala API, java API, scala unit tests and java unit tests. Updated examples to use the external projects.
\| \| * \| \| \| \| \| \|	Refactored streaming project to separate out the twitter functionality.	Tathagata Das	2013-12-26	9	-14/+64
\| \| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \| \|	Merge pull request #336 from liancheng/akka-remote-lookup	Patrick Wendell	2014-01-07	8	-38/+30
\| \|\ \ \ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Get rid of `Either[ActorRef, ActorSelection]' In this pull request, instead of returning an `Either[ActorRef, ActorSelection]`, `registerOrLookup` identifies the remote actor blockingly to obtain an `ActorRef`, or throws an exception if the remote actor doesn't exist or the lookup times out (configured by `spark.akka.lookupTimeout`). This function is only called when an `SparkEnv` is constructed (instantiating driver or executor), so the blocking call is considered acceptable. Executor side `ActorSelection`s/`ActorRef`s to driver side `MapOutputTrackerMasterActor` and `BlockManagerMasterActor` are affected by this pull request. `ActorSelection` is dangerous and should be used with care. It's only absolutely safe to send messages via an `ActorSelection` when the remote actor is stateless, so that actor incarnation is irrelevant. But as pointed by @ScrapCodes in the comments below, executor exits immediately once the connection to the driver lost, `ActorSelection`s are not harmful in this scenario. So this pull request is mostly a code style patch.
\| \| * \| \| \| \| \| \| \|	Fixed test suite compilation errors	Lian, Cheng	2014-01-06	1	-3/+3
\| \| \| \| \| \| \| \| \| \|
\| \| * \| \| \| \| \| \| \|	Fixed several compilation errors in test suites	Lian, Cheng	2014-01-06	2	-5/+8
\| \| \| \| \| \| \| \| \| \|
\| \| * \| \| \| \| \| \| \|	Get rid of `Either[ActorRef, ActorSelection]'	Lian, Cheng	2014-01-06	6	-30/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Although we can send messages via an ActorSelection, it would be better to identify the actor and obtain an ActorRef first, so that we can get informed earlier if the remote actor doesn't exist, and get rid of the annoying Either wrapper.
\| * \| \| \| \| \| \| \| \|	Merge pull request #327 from lucarosellini/master	Matei Zaharia	2014-01-08	3	-3/+73
\| \|\ \ \ \ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Added ‘-i’ command line option to Spark REPL We had to create a new implementation of both scala.tools.nsc.CompilerCommand and scala.tools.nsc.Settings, because using scala.tools.nsc.GenericRunnerSettings would bring in other options (-howtorun, -save and -execute) which don’t make sense in Spark. Any new Spark specific command line option could now be added to org.apache.spark.repl.SparkRunnerSettings class. Since the behavior of loading a script from the command line should be the same as loading it using the “:load” command inside the shell, the script should be loaded when the SparkContext is available, that’s why we had to move the call to ‘loadfiles(settings)’ _after_ the call to postInitialization(). This still doesn’t work if ‘isAsync = true’.
\| \| * \| \| \| \| \| \| \| \|	Added license header and removed @author tag	Luca Rosellini	2014-01-07	2	-4/+34
\| \| \| \| \| \| \| \| \| \| \|
\| \| * \| \| \| \| \| \| \| \|	Added ‘-i’ command line option to spark REPL.	Luca Rosellini	2014-01-03	3	-3/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We had to create a new implementation of both scala.tools.nsc.CompilerCommand and scala.tools.nsc.Settings, because using scala.tools.nsc.GenericRunnerSettings would bring in other options (-howtorun, -save and -execute) which don’t make sense in Spark. Any new Spark specific command line option could now be added to org.apache.spark.repl.SparkRunnerSettings class. Since the behavior of loading a script from the command line should be the same as loading it using the “:load” command inside the shell, the script should be loaded when the SparkContext is available, that’s why we had to move the call to ‘loadfiles(settings)’ _after_ the call to postInitialization(). This still doesn’t work if ‘isAsync = true’.
\| \| * \| \| \| \| \| \| \| \|	Merge pull request #1 from apache/master	Luca Rosellini	2014-01-03	52	-1542/+785
\| \| \|\ \ \ \ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Merge latest Spark changes
\| * \| \ \ \ \ \ \ \ \ \	Merge pull request #354 from hsaputra/addasfheadertosbt	Matei Zaharia	2014-01-08	1	-0/+18
\| \|\ \ \ \ \ \ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add ASF header to the new sbt script. Add ASF header to the new sbt script.
\| \| * \| \| \| \| \| \| \| \| \| \|	Add ASF header to the new sbt script.	Henry Saputra	2014-01-07	1	-0/+18
\| \| \| \|_\|_\|_\|_\|_\|/ / / / \| \| \|/\| \| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \| \| \| \| \|	Merge pull request #350 from mateiz/standalone-limit	Matei Zaharia	2014-01-08	13	-20/+70
\| \|\ \ \ \ \ \ \ \ \ \ \ \| \| \|/ / / / / / / / / / \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add way to limit default # of cores used by apps in standalone mode Also documents the spark.deploy.spreadOut option, and fixes a config option that had a dash in its name.
\| \| * \| \| \| \| \| \| \| \| \|	Address review comments	Matei Zaharia	2014-01-07	8	-8/+11
\| \| \| \| \| \| \| \| \| \| \| \|
\| \| * \| \| \| \| \| \| \| \| \|	Fix unit test compilation	Matei Zaharia	2014-01-07	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \|
\| \| * \| \| \| \| \| \| \| \| \|	Add way to limit default # of cores used by applications on standalone mode	Matei Zaharia	2014-01-07	8	-14/+60
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Also documents the spark.deploy.spreadOut option.
\| * \| \| \| \| \| \| \| \| \| \|	Merge pull request #352 from markhamstra/oldArch	Patrick Wendell	2014-01-07	1	-8/+2
\| \|\ \ \ \ \ \ \ \ \ \ \ \| \| \|_\|_\|_\|_\|_\|_\|_\|_\|/ / \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Don't leave os.arch unset after BlockManagerSuite Recent SparkConf changes meant that BlockManagerSuite was now leaving the os.arch System.property unset. That's a problem for any subsequent tests that rely upon having a valid os.arch. This is true for CompressionCodecSuite in the usual maven build test order, even though it isn't usually true for the sbt build.
\| \| * \| \| \| \| \| \| \| \| \|	Fix BlockManagerSuite#after	Mark Hamstra	2014-01-07	1	-8/+2
\| \| \| \| \| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \| \| \| \| \|	Merge pull request #328 from falaki/MatrixFactorizationModel-fix	Patrick Wendell	2014-01-07	5	-4/+134
\| \|\ \ \ \ \ \ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SPARK-1012: DAGScheduler Exception Fix Added a predict method to MatrixFactorizationModel to enable bulk prediction. This method takes and RDD[(Int, Int)] of users and products and return an RDD with a Rating element per each element in the input RDD. Also added python bindings to the new bulk prediction methods to address SPARK-1011 issue. This is ready to be merged now.
\| \| * \| \| \| \| \| \| \| \| \| \|	Merge branch 'master' into MatrixFactorizationModel-fix	Hossein Falaki	2014-01-07	170	-1931/+1342
\| \| \|\\| \| \| \| \| \| \| \| \| \|
\| \| * \| \| \| \| \| \| \| \| \| \|	Added predictAll python function to MatrixFactorizationModel	Hossein Falaki	2014-01-06	1	-4/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|
\| \| * \| \| \| \| \| \| \| \| \| \|	Added Rating deserializer	Hossein Falaki	2014-01-06	2	-4/+26
\| \| \| \| \| \| \| \| \| \| \| \| \|
\| \| * \| \| \| \| \| \| \| \| \| \|	Added serializing method for Rating object	Hossein Falaki	2014-01-06	1	-4/+16
\| \| \| \| \| \| \| \| \| \| \| \| \|
\| \| * \| \| \| \| \| \| \| \| \| \|	Added python binding for bulk recommendation	Hossein Falaki	2014-01-04	4	-2/+46
\| \| \| \| \| \| \| \| \| \| \| \| \|
\| \| * \| \| \| \| \| \| \| \| \| \|	Removed unnecessary blank line	Hossein Falaki	2014-01-03	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|
\| \| * \| \| \| \| \| \| \| \| \| \|	Added unit tests for bulk prediction in MatrixFactorizationModel	Hossein Falaki	2014-01-03	1	-2/+31
\| \| \| \| \| \| \| \| \| \| \| \| \|
\| \| * \| \| \| \| \| \| \| \| \| \|	Added a method to enable bulk prediction	Hossein Falaki	2014-01-03	1	-1/+23
\| \| \| \| \| \| \| \| \| \| \| \| \|
\| * \| \| \| \| \| \| \| \| \| \| \|	Merge pull request #351 from pwendell/maven-fix	Patrick Wendell	2014-01-07	4	-7/+12
\| \|\ \ \ \ \ \ \ \ \ \ \ \ \| \| \|_\|/ / / / / / / / / / \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add log4j exclusion rule to maven. To make this work I had to rename the defaults file. Otherwise maven's pattern matching rules included it when trying to match other log4j.properties files. I also fixed a bug in the existing maven build where two <transformers> tags were present in assembly/pom.xml such that one overwrote the other.
\| \| * \| \| \| \| \| \| \| \| \| \|	Add log4j exclusion rule to maven.	Patrick Wendell	2014-01-07	4	-7/+12
\| \| \| \|/ / / / / / / / / \| \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To make this work I had to rename the defaults file. Otherwise maven's pattern matching rules included it when trying to match other log4j.properties files. I also fixed a bug in the existing maven build where two <transformers> tags were present in assembly/pom.xml such that one overwrote the other.
\| * \| \| \| \| \| \| \| \| \| \|	Merge pull request #337 from yinxusen/mllib-16-bugfix	Reynold Xin	2014-01-07	2	-2/+118
\| \|\ \ \ \ \ \ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Mllib 16 bugfix Bug fix: https://spark-project.atlassian.net/browse/MLLIB-16 Hi, I fixed the bug and added a test suite for `GradientDescent`. There are 2 checks in the test case. First, the final loss must be lower than the initial one. Second, the trend of loss sequence should be decreasing, i.e., at least 80% iterations have lower losses than their prior iterations. Thanks!
\| \| * \| \| \| \| \| \| \| \| \| \|	Added GradientDescentSuite	Xusen Yin	2014-01-06	1	-0/+116
\| \| \| \| \| \| \| \| \| \| \| \| \|
\| \| * \| \| \| \| \| \| \| \| \| \|	fix logistic loss bug	Xusen Yin	2014-01-06	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|