aboutsummaryrefslogtreecommitdiff
path: root/examples
Commit message (Collapse)AuthorAgeFilesLines
* SPARK-1121: Include avro for yarn-alpha buildsPatrick Wendell2014-03-021-0/+14
| | | | | | | | | | | | | | | | | This lets us explicitly include Avro based on a profile for 0.23.X builds. It makes me sad how convoluted it is to express this logic in Maven. @tgraves and @sryza curious if this works for you. I'm also considering just reverting to how it was before. The only real problem was that Spark advertised a dependency on Avro even though it only really depends transitively on Avro through other deps. Author: Patrick Wendell <pwendell@gmail.com> Closes #49 from pwendell/avro-build-fix and squashes the following commits: 8d6ee92 [Patrick Wendell] SPARK-1121: Add avro to yarn-alpha profile
* SPARK-1084.2 (resubmitted)Sean Owen2014-03-021-8/+0
| | | | | | | | | | | | | (Ported from https://github.com/apache/incubator-spark/pull/650 ) This adds one more change though, to fix the scala version warning introduced by json4s recently. Author: Sean Owen <sowen@cloudera.com> Closes #32 from srowen/SPARK-1084.2 and squashes the following commits: 9240abd [Sean Owen] Avoid scala version conflict in scalap induced by json4s dependency 1561cec [Sean Owen] Remove "exclude *" dependencies that are causing Maven warnings, and that are apparently unneeded anyway
* Remove remaining references to incubationPatrick Wendell2014-03-021-2/+2
| | | | | | | | | | This removes some loose ends not caught by the other (incubating -> tlp) patches. @markhamstra this updates the version as you mentioned earlier. Author: Patrick Wendell <pwendell@gmail.com> Closes #51 from pwendell/tlp and squashes the following commits: d553b1b [Patrick Wendell] Remove remaining references to incubation
* SPARK-1071: Tidy logging strategy and use of log4jSean Owen2014-02-231-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prompted by a recent thread on the mailing list, I tried and failed to see if Spark can be made independent of log4j. There are a few cases where control of the underlying logging is pretty useful, and to do that, you have to bind to a specific logger. Instead I propose some tidying that leaves Spark's use of log4j, but gets rid of warnings and should still enable downstream users to switch. The idea is to pipe everything (except log4j) through SLF4J, and have Spark use SLF4J directly when logging, and where Spark needs to output info (REPL and tests), bind from SLF4J to log4j. This leaves the same behavior in Spark. It means that downstream users who want to use something except log4j should: - Exclude dependencies on log4j, slf4j-log4j12 from Spark - Include dependency on log4j-over-slf4j - Include dependency on another logger X, and another slf4j-X - Recreate any log config that Spark does, that is needed, in the other logger's config That sounds about right. Here are the key changes: - Include the jcl-over-slf4j shim everywhere by depending on it in core. - Exclude dependencies on commons-logging from third-party libraries. - Include the jul-to-slf4j shim everywhere by depending on it in core. - Exclude slf4j-* dependencies from third-party libraries to prevent collision or warnings - Added missing slf4j-log4j12 binding to GraphX, Bagel module tests And minor/incidental changes: - Update to SLF4J 1.7.5, which happily matches Hadoop 2’s version and is a recommended update over 1.7.2 - (Remove a duplicate HBase dependency declaration in SparkBuild.scala) - (Remove a duplicate mockito dependency declaration that was causing warnings and bugging me) Author: Sean Owen <sowen@cloudera.com> Closes #570 from srowen/SPARK-1071 and squashes the following commits: 52eac9f [Sean Owen] Add slf4j-over-log4j12 dependency to core (non-test) and remove it from things that depend on core. 77a7fa9 [Sean Owen] SPARK-1071: Tidy logging strategy and use of log4j
* Merge pull request #567 from ScrapCodes/style2.Prashant Sharma2014-02-0920-108/+144
| | | | | | | | | | | | | | | | SPARK-1058, Fix Style Errors and Add Scala Style to Spark Build. Pt 2 Continuation of PR #557 With this all scala style errors are fixed across the code base !! The reason for creating a separate PR was to not interrupt an already reviewed and ready to merge PR. Hope this gets reviewed soon and merged too. Author: Prashant Sharma <prashant.s@imaginea.com> Closes #567 and squashes the following commits: 3b1ec30 [Prashant Sharma] scala style fixes
* Merge pull request #557 from ScrapCodes/style. Closes #557.Patrick Wendell2014-02-094-14/+25
| | | | | | | | | | | | | | | | | | | | | SPARK-1058, Fix Style Errors and Add Scala Style to Spark Build. Author: Patrick Wendell <pwendell@gmail.com> Author: Prashant Sharma <scrapcodes@gmail.com> == Merge branch commits == commit 1a8bd1c059b842cb95cc246aaea74a79fec684f4 Author: Prashant Sharma <scrapcodes@gmail.com> Date: Sun Feb 9 17:39:07 2014 +0530 scala style fixes commit f91709887a8e0b608c5c2b282db19b8a44d53a43 Author: Patrick Wendell <pwendell@gmail.com> Date: Fri Jan 24 11:22:53 2014 -0800 Adding scalastyle snapshot
* Merge pull request #542 from markhamstra/versionBump. Closes #542.Mark Hamstra2014-02-081-1/+1
| | | | | | | | | | | | | | | | | | Version number to 1.0.0-SNAPSHOT Since 0.9.0-incubating is done and out the door, we shouldn't be building 0.9.0-incubating-SNAPSHOT anymore. @pwendell Author: Mark Hamstra <markhamstra@gmail.com> == Merge branch commits == commit 1b00a8a7c1a7f251b4bb3774b84b9e64758eaa71 Author: Mark Hamstra <markhamstra@gmail.com> Date: Wed Feb 5 09:30:32 2014 -0800 Version number to 1.0.0-SNAPSHOT
* Merge pull request #540 from sslavic/patch-3. Closes #540.Stevo Slavić2014-02-051-2/+2
| | | | | | | | | | | | | | | | | | | | | | Fix line end character stripping for Windows LogQuery Spark example would produce unwanted result when run on Windows platform because of different, platform specific trailing line end characters (not only \n but \r too). This fix makes use of Scala's standard library string functions to properly strip all trailing line end characters, letting Scala handle the platform specific stuff. Author: Stevo Slavić <sslavic@gmail.com> == Merge branch commits == commit 1e43ba0ea773cc005cf0aef78b6c1755f8e88b27 Author: Stevo Slavić <sslavic@gmail.com> Date: Wed Feb 5 14:48:29 2014 +0100 Fix line end character stripping for Windows LogQuery Spark example would produce unwanted result when run on Windows platform because of different, platform specific trailing line end characters (not only \n but \r too). This fix makes use of Scala's standard library string functions to properly strip all trailing line end characters, letting Scala handle the platform specific stuff.
* Merge pull request #529 from hsaputra/cleanup_right_arrowop_scalaHenry Saputra2014-02-021-1/+1
| | | | | | | | | | | | | | | | | | | | | Change the ⇒ character (maybe from scalariform) to => in Scala code for style consistency Looks like there are some ⇒ Unicode character (maybe from scalariform) in Scala code. This PR is to change it to => to get some consistency on the Scala code. If we want to use ⇒ as default we could use sbt plugin scalariform to make sure all Scala code has ⇒ instead of => And remove unused imports found in TwitterInputDStream.scala while I was there =) Author: Henry Saputra <hsaputra@apache.org> == Merge branch commits == commit 29c1771d346dff901b0b778f764e6b4409900234 Author: Henry Saputra <hsaputra@apache.org> Date: Sat Feb 1 22:05:16 2014 -0800 Change the ⇒ character (maybe from scalariform) to => in Scala code for style consistency.
* Merge pull request #492 from skicavs/masterPatrick Wendell2014-01-221-2/+2
|\ | | | | | | fixed job name and usage information for the JavaSparkPi example
| * fixed job name and usage information for the JavaSparkPi exampleKevin Mader2014-01-221-2/+2
| |
* | Merge pull request #315 from rezazadeh/sparsesvdMatei Zaharia2014-01-221-0/+59
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sparse SVD # Singular Value Decomposition Given an *m x n* matrix *A*, compute matrices *U, S, V* such that *A = U * S * V^T* There is no restriction on m, but we require n^2 doubles to fit in memory. Further, n should be less than m. The decomposition is computed by first computing *A^TA = V S^2 V^T*, computing svd locally on that (since n x n is small), from which we recover S and V. Then we compute U via easy matrix multiplication as *U = A * V * S^-1* Only singular vectors associated with the largest k singular values If there are k such values, then the dimensions of the return will be: * *S* is *k x k* and diagonal, holding the singular values on diagonal. * *U* is *m x k* and satisfies U^T*U = eye(k). * *V* is *n x k* and satisfies V^TV = eye(k). All input and output is expected in sparse matrix format, 0-indexed as tuples of the form ((i,j),value) all in RDDs. # Testing Tests included. They test: - Decomposition promise (A = USV^T) - For small matrices, output is compared to that of jblas - Rank 1 matrix test included - Full Rank matrix test included - Middle-rank matrix forced via k included # Example Usage import org.apache.spark.SparkContext import org.apache.spark.mllib.linalg.SVD import org.apache.spark.mllib.linalg.SparseMatrix import org.apache.spark.mllib.linalg.MatrixyEntry // Load and parse the data file val data = sc.textFile("mllib/data/als/test.data").map { line => val parts = line.split(',') MatrixEntry(parts(0).toInt, parts(1).toInt, parts(2).toDouble) } val m = 4 val n = 4 // recover top 1 singular vector val decomposed = SVD.sparseSVD(SparseMatrix(data, m, n), 1) println("singular values = " + decomposed.S.data.toArray.mkString) # Documentation Added to docs/mllib-guide.md
| * Merge remote-tracking branch 'upstream/master' into sparsesvdReza Zadeh2014-01-173-1/+25
| |\
| * | make example 0-indexedReza Zadeh2014-01-171-1/+1
| | |
| * | changes from PRReza Zadeh2014-01-171-1/+1
| | |
| * | Merge remote-tracking branch 'upstream/master' into sparsesvdReza Zadeh2014-01-1314-26/+75
| |\ \
| * \ \ Merge remote-tracking branch 'upstream/master' into sparsesvdReza Zadeh2014-01-1120-36/+200
| |\ \ \
| * | | | add dimension parameters to exampleReza Zadeh2014-01-101-5/+5
| | | | |
| * | | | Merge remote-tracking branch 'upstream/master' into sparsesvdReza Zadeh2014-01-0947-184/+304
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: docs/mllib-guide.md
| * | | | | fix exampleReza Zadeh2014-01-091-2/+2
| | | | | |
| * | | | | use SparseMatrix everywhereReza Zadeh2014-01-041-4/+5
| | | | | |
| * | | | | new example fileReza Zadeh2014-01-041-0/+58
| | | | | |
* | | | | | Added StreamingContext.awaitTermination to streaming examples.Tathagata Das2014-01-2017-0/+17
| | | | | |
* | | | | | Updated java API docs for streaming, along with very minor changes in the ↵Tathagata Das2014-01-162-3/+2
| |_|_|_|/ |/| | | | | | | | | | | | | | code examples.
* | | | | Add GraphX dependency to examples/pom.xmlAnkur Dave2014-01-141-0/+6
| | | | |
* | | | | Add missing header filesPatrick Wendell2014-01-141-0/+17
| | | | |
* | | | | Merge remote-tracking branch 'apache/master' into filestream-fixTathagata Das2014-01-131-0/+49
|\ \ \ \ \ | | |_|_|/ | |/| | | | | | | | | | | | | Conflicts: streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala
| * | | | Merge branch 'master' into graphxReynold Xin2014-01-1328-61/+271
| |\ \ \ \
| * | | | | Add LiveJournalPageRank exampleAnkur Dave2014-01-131-0/+49
| | | | | |
| * | | | | Revert changes to examples/.../PageRankUtils.scalaAnkur Dave2014-01-091-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | Reverts to 04d83fc37f9eef89c20331c85291a0a169f75e6d:examples/src/main/scala/org/apache/spark/examples/bagel/PageRankUtils.scala.
| * | | | | Merge remote-tracking branch 'spark-upstream/master' into HEADAnkur Dave2014-01-0848-213/+295
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: README.md core/src/main/scala/org/apache/spark/util/collection/OpenHashMap.scala core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala core/src/main/scala/org/apache/spark/util/collection/PrimitiveKeyOpenHashMap.scala pom.xml project/SparkBuild.scala repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala
| * \ \ \ \ \ Merge branch 'master' of github.com:apache/incubator-sparkReynold Xin2013-11-256-16/+19
| |\ \ \ \ \ \
| * \ \ \ \ \ \ Merge branch 'master' of github.com:apache/incubator-spark into mergemergeReynold Xin2013-11-041-1/+2
| |\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: README.md core/src/main/scala/org/apache/spark/util/collection/OpenHashMap.scala core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala core/src/main/scala/org/apache/spark/util/collection/PrimitiveKeyOpenHashMap.scala
| * \ \ \ \ \ \ \ Merge remote-tracking branch 'spark-upstream/master'Ankur Dave2013-10-306-23/+263
| |\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: project/SparkBuild.scala
| * \ \ \ \ \ \ \ \ Merge branch 'master' of https://github.com/apache/incubator-spark into ↵Joseph E. Gonzalez2013-10-181-4/+9
| |\ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | indexedrdd_graphx
| * \ \ \ \ \ \ \ \ \ merged with upstream changesJoseph E. Gonzalez2013-10-142-11/+23
| |\ \ \ \ \ \ \ \ \ \
| * \ \ \ \ \ \ \ \ \ \ Merging latest changes from spark main branchJoseph E. Gonzalez2013-09-1753-353/+1167
| |\ \ \ \ \ \ \ \ \ \ \
* | | | | | | | | | | | | Removed StreamingContext.registerInputStream and registerOutputStream - they ↵Tathagata Das2014-01-131-1/+2
| |_|_|_|_|_|_|_|/ / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | were useless as InputDStream has been made to register itself. Also made DStream.register() private[streaming] - not useful to expose the confusing function. Updated a lot of documentation.
* | | | | | | | | | | | Merge pull request #400 from tdas/dstream-movePatrick Wendell2014-01-131-1/+1
|\ \ \ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Moved DStream and PairDSream to org.apache.spark.streaming.dstream Similar to the package location of `org.apache.spark.rdd.RDD`, `DStream` has been moved from `org.apache.spark.streaming.DStream` to `org.apache.spark.streaming.dstream.DStream`. I know that the package name is a little long, but I think its better to keep it consistent with Spark's structure. Also fixed persistence of windowed DStream. The RDDs generated generated by windowed DStream are essentially unions of underlying RDDs, and persistent these union RDDs would store numerous copies of the underlying data. Instead setting the persistence level on the windowed DStream is made to set the persistence level of the underlying DStream.
| * \ \ \ \ \ \ \ \ \ \ \ Merge remote-tracking branch 'apache/master' into dstream-moveTathagata Das2014-01-126-9/+9
| |\ \ \ \ \ \ \ \ \ \ \ \ | | |_|_|_|_|_|_|_|_|_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala
* | | | | | | | | | | | | Merge pull request #395 from hsaputra/remove_simpleredundantreturn_scalaPatrick Wendell2014-01-127-17/+17
|\ \ \ \ \ \ \ \ \ \ \ \ \ | |_|/ / / / / / / / / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove simple redundant return statements for Scala methods/functions Remove simple redundant return statements for Scala methods/functions: -) Only change simple return statements at the end of method -) Ignore the complex if-else check -) Ignore the ones inside synchronized -) Add small changes to making var to val if possible and remove () for simple get This hopefully makes the review simpler =) Pass compile and tests.
| * | | | | | | | | | | | Merge branch 'master' into remove_simpleredundantreturn_scalaHenry Saputra2014-01-1221-36/+246
| |\| | | | | | | | | | |
| * | | | | | | | | | | | Remove simple redundant return statement for Scala methods/functions:Henry Saputra2014-01-127-17/+17
| | |_|_|_|_|_|_|_|/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | -) Only change simple return statements at the end of method -) Ignore the complex if-else check -) Ignore the ones inside synchronized
* | | | | | | | | | | | Rename DStream.foreach to DStream.foreachRDDPatrick Wendell2014-01-125-8/+8
| |/ / / / / / / / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | `foreachRDD` makes it clear that the granularity of this operator is per-RDD. As it stands, `foreach` is inconsistent with with `map`, `filter`, and the other DStream operators which get pushed down to individual records within each RDD.
* | | | | | | | | | | Merge remote-tracking branch 'apache/master' into driver-testTathagata Das2014-01-1019-30/+76
|\ \ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: streaming/src/main/scala/org/apache/spark/streaming/DStreamGraph.scala
| * \ \ \ \ \ \ \ \ \ \ Merge pull request #363 from pwendell/streaming-logsPatrick Wendell2014-01-0919-30/+76
| |\ \ \ \ \ \ \ \ \ \ \ | | |_|_|_|_|_|_|_|_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Set default logging to WARN for Spark streaming examples. This programatically sets the log level to WARN by default for streaming tests. If the user has already specified a log4j.properties file, the user's file will take precedence over this default.
| | * | | | | | | | | | Minor clean-upPatrick Wendell2014-01-091-1/+1
| | | | | | | | | | | |
| | * | | | | | | | | | Set default logging to WARN for Spark streaming examples.Patrick Wendell2014-01-0919-29/+75
| | |/ / / / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This programatically sets the log level to WARN by default for streaming tests. If the user has already specified a log4j.properties file, the user's file will take precedence over this default.
* | | | | | | | | | | Updated docs based on Patrick's comments in PR 383.Tathagata Das2014-01-102-12/+40
| | | | | | | | | | |
* | | | | | | | | | | Merge branch 'standalone-driver' into driver-testTathagata Das2014-01-0948-191/+272
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: core/src/main/scala/org/apache/spark/SparkContext.scala core/src/main/scala/org/apache/spark/deploy/worker/DriverRunner.scala examples/src/main/java/org/apache/spark/streaming/examples/JavaNetworkWordCount.java streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobGenerator.scala