aboutsummaryrefslogtreecommitdiff
path: root/examples
Commit message (Collapse)AuthorAgeFilesLines
* [SPARK-11381][DOCS] Replace example code in mllib-linear-methods.md using ↵Dongjoon Hyun2016-02-2611-0/+733
| | | | | | | | | | | | | | | | | | | | | include_example ## What changes were proposed in this pull request? This PR replaces example codes in `mllib-linear-methods.md` using `include_example` by doing the followings: * Extracts the example codes(Scala,Java,Python) as files in `example` module. * Merges some dialog-style examples into a single file. * Hide redundant codes in HTML for the consistency with other docs. ## How was the this patch tested? manual test. This PR can be tested by document generations, `SKIP_API=1 jekyll build`. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11320 from dongjoon-hyun/SPARK-11381.
* [SPARK-13457][SQL] Removes DataFrame RDD operationsCheng Lian2016-02-276-8/+9
| | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? This is another try of PR #11323. This PR removes DataFrame RDD operations except for `foreach` and `foreachPartitions` (they are actions rather than transformations). Original calls are now replaced by calls to methods of `DataFrame.rdd`. PR #11323 was reverted because it introduced a regression: both `DataFrame.foreach` and `DataFrame.foreachPartitions` wrap underlying RDD operations with `withNewExecutionId` to track Spark jobs. But they are removed in #11323. ## How was the this patch tested? No extra tests are added. Existing tests should do the work. Author: Cheng Lian <lian@databricks.com> Closes #11388 from liancheng/remove-df-rdd-ops.
* Revert "[SPARK-13457][SQL] Removes DataFrame RDD operations"Davies Liu2016-02-256-9/+8
| | | | This reverts commit 157fe64f3ecbd13b7286560286e50235eecfe30e.
* [SPARK-13457][SQL] Removes DataFrame RDD operationsCheng Lian2016-02-256-8/+9
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? This PR removes DataFrame RDD operations. Original calls are now replaced by calls to methods of `DataFrame.rdd`. ## How was the this patch tested? No extra tests are added. Existing tests should do the work. Author: Cheng Lian <lian@databricks.com> Closes #11323 from liancheng/remove-df-rdd-ops.
* [SPARK-10759][ML] update cross validator with include_exampleJeremyNixon2016-02-231-1/+4
| | | | | | | | This pull request uses {%include_example%} to add an example for the python cross validator to ml-guide. Author: JeremyNixon <jnixon2@gmail.com> Closes #11240 from JeremyNixon/pipeline_include_example.
* [SPARK-13257][IMPROVEMENT] Refine naive Bayes example by checking model ↵movelikeriver2016-02-221-2/+15
| | | | | | | | | | after loading it Refine naive Bayes example by checking model after loading it Author: movelikeriver <mars.lenjoy@gmail.com> Closes #11125 from movelikeriver/naive_bayes.
* [SPARK-13012][DOCUMENTATION] Replace example code in ml-guide.md using ↵Devaraj K2016-02-2212-0/+1015
| | | | | | | | | | include_example Replaced example code in ml-guide.md using include_example Author: Devaraj K <devaraj@apache.org> Closes #11053 from devaraj-kavali/SPARK-13012.
* [SPARK-13016][DOCUMENTATION] Replace example code in ↵Devaraj K2016-02-225-0/+311
| | | | | | | | | | | mllib-dimensionality-reduction.md using include_example Replaced example example code in mllib-dimensionality-reduction.md using include_example Author: Devaraj K <devaraj@apache.org> Closes #11132 from devaraj-kavali/SPARK-13016.
* [SPARK-13248][STREAMING] Remove deprecated Streaming APIs.Luciano Resende2016-02-211-9/+9
| | | | | | | | Remove deprecated Streaming APIs and adjust sample applications. Author: Luciano Resende <lresende@apache.org> Closes #11139 from lresende/streaming-deprecated-apis.
* [SPARK-13324][CORE][BUILD] Update plugin, test, example dependencies for 2.xSean Owen2016-02-171-3/+3
| | | | | | | | Phase 1: update plugin versions, test dependencies, some example and third-party versions Author: Sean Owen <sowen@cloudera.com> Closes #11206 from srowen/SPARK-13324.
* [SPARK-12953][EXAMPLES] RDDRelation writer set overwrite modeshijinkui2016-02-171-4/+3
| | | | | | | | | | | | | https://issues.apache.org/jira/browse/SPARK-12953 fix error when run RDDRelation.main(): "path file:/Users/sjk/pair.parquet already exists" Set DataFrameWriter's mode to SaveMode.Overwrite Author: shijinkui <shijinkui666@163.com> Closes #10864 from shijinkui/set_mode.
* [SPARK-12247][ML][DOC] Documentation for spark.ml's ALS and collaborative ↵BenFradet2016-02-164-182/+264
| | | | | | | | | | filtering in general This documents the implementation of ALS in `spark.ml` with example code in scala, java and python. Author: BenFradet <benjamin.fradet@gmail.com> Closes #10411 from BenFradet/SPARK-12247.
* [SPARK-13018][DOCS] Replace example code in mllib-pmml-model-export.md using ↵Xin Ren2016-02-151-0/+59
| | | | | | | | | | | | | | | | | | | | | include_example Replace example code in mllib-pmml-model-export.md using include_example https://issues.apache.org/jira/browse/SPARK-13018 The example code in the user guide is embedded in the markdown and hence it is not easy to test. It would be nice to automatically test them. This JIRA is to discuss options to automate example code testing and see what we can do in Spark 1.6. Goal is to move actual example code to spark/examples and test compilation in Jenkins builds. Then in the markdown, we can reference part of the code to show in the user guide. This requires adding a Jekyll tag that is similar to https://github.com/jekyll/jekyll/blob/master/lib/jekyll/tags/include.rb, e.g., called include_example. `{% include_example scala/org/apache/spark/examples/mllib/PMMLModelExportExample.scala %}` Jekyll will find `examples/src/main/scala/org/apache/spark/examples/mllib/PMMLModelExportExample.scala` and pick code blocks marked "example" and replace code block in `{% highlight %}` in the markdown. See more sub-tasks in parent ticket: https://issues.apache.org/jira/browse/SPARK-11337 Author: Xin Ren <iamshrek@126.com> Closes #11126 from keypointt/SPARK-13018.
* [SPARK-12363][MLLIB] Remove setRun and fix PowerIterationClustering failed testLiang-Chi Hsieh2016-02-131-32/+21
| | | | | | | | | | | JIRA: https://issues.apache.org/jira/browse/SPARK-12363 This issue is pointed by yanboliang. When `setRuns` is removed from PowerIterationClustering, one of the tests will be failed. I found that some `dstAttr`s of the normalized graph are not correct values but 0.0. By setting `TripletFields.All` in `mapTriplets` it can work. Author: Liang-Chi Hsieh <viirya@gmail.com> Author: Xiangrui Meng <meng@databricks.com> Closes #10539 from viirya/fix-poweriter.
* [SPARK-13170][STREAMING] Investigate replacing SynchronizedQueue as it is ↵Sean Owen2016-02-091-3/+5
| | | | | | | | | | deprecated Replace SynchronizeQueue with synchronized access to a Queue Author: Sean Owen <sowen@cloudera.com> Closes #11111 from srowen/SPARK-13170.
* [SPARK-13177][EXAMPLES] Update ActorWordCount example to not directly use ↵sachin aggarwal2016-02-091-4/+4
| | | | | | | | low level linked list as it is deprecated. Author: sachin aggarwal <different.sachin@gmail.com> Closes #11113 from agsachin/master.
* [SPARK-6363][BUILD] Make Scala 2.11 the default Scala versionJosh Rosen2016-01-301-2/+2
| | | | | | | | | | | | This patch changes Spark's build to make Scala 2.11 the default Scala version. To be clear, this does not mean that Spark will stop supporting Scala 2.10: users will still be able to compile Spark for Scala 2.10 by following the instructions on the "Building Spark" page; however, it does mean that Scala 2.11 will be the default Scala version used by our CI builds (including pull request builds). The Scala 2.11 compiler is faster than 2.10, so I think we'll be able to look forward to a slight speedup in our CI builds (it looks like it's about 2X faster for the Maven compile-only builds, for instance). After this patch is merged, I'll update Jenkins to add new compile-only jobs to ensure that Scala 2.10 compilation doesn't break. Author: Josh Rosen <joshrosen@databricks.com> Closes #10608 from JoshRosen/SPARK-6363.
* [SPARK-3369][CORE][STREAMING] Java mapPartitions Iterator->Iterable is ↵Sean Owen2016-01-2611-41/+44
| | | | | | | | | | | | inconsistent with Scala's Iterator->Iterator Fix Java function API methods for flatMap and mapPartitions to require producing only an Iterator, not Iterable. Also fix DStream.flatMap to require a function producing TraversableOnce only, not Traversable. CC rxin pwendell for API change; tdas since it also touches streaming. Author: Sean Owen <sowen@cloudera.com> Closes #10413 from srowen/SPARK-3369.
* [SPARK-12960] [PYTHON] Some examples are missing support for python2Mark Grover2016-01-211-0/+1
| | | | | | | | Without importing the print_function, the lines later on like ```print("Usage: direct_kafka_wordcount.py <broker_list> <topic>", file=sys.stderr)``` fail when using python2.*. Import fixes that problem and doesn't break anything on python3 either. Author: Mark Grover <mark@apache.org> Closes #10872 from markgrover/python2_compat.
* [SPARK-7799][SPARK-12786][STREAMING] Add "streaming-akka" projectShixiong Zhu2016-01-204-25/+44
| | | | | | | | | | | | | Include the following changes: 1. Add "streaming-akka" project and org.apache.spark.streaming.akka.AkkaUtils for creating an actorStream 2. Remove "StreamingContext.actorStream" and "JavaStreamingContext.actorStream" 3. Update the ActorWordCount example and add the JavaActorWordCount example 4. Make "streaming-zeromq" depend on "streaming-akka" and update the codes accordingly Author: Shixiong Zhu <shixiong@databricks.com> Closes #10744 from zsxwing/streaming-akka-2.
* [SPARK-12667] Remove block manager's internal "external block store" APIReynold Xin2016-01-152-143/+0
| | | | | | | | | | This pull request removes the external block store API. This is rarely used, and the file system interface is actually a better, more standard way to interact with external storage systems. There are some other things to remove also, as pointed out by JoshRosen. We will do those as follow-up pull requests. Author: Reynold Xin <rxin@databricks.com> Closes #10752 from rxin/remove-offheap.
* [SPARK-12830] Java style: disallow trailing whitespaces.Reynold Xin2016-01-142-2/+2
| | | | | | Author: Reynold Xin <rxin@databricks.com> Closes #10764 from rxin/SPARK-12830.
* [SPARK-12692][BUILD][STREAMING] Scala style: Fix the style violation (Space ↵Kousuke Saruta2016-01-111-7/+7
| | | | | | | | | | | before "," or ":") Fix the style violation (space before , and :). This PR is a followup for #10643. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #10685 from sarutak/SPARK-12692-followup-streaming.
* [SPARK-12603][MLLIB] PySpark MLlib GaussianMixtureModel should support ↵Yanbo Liang2016-01-112-0/+10
| | | | | | | | | | single instance predict/predictSoft PySpark MLlib ```GaussianMixtureModel``` should support single instance ```predict/predictSoft``` just like Scala do. Author: Yanbo Liang <ybliang8@gmail.com> Closes #10552 from yanboliang/spark-12603.
* removed lambda from sortByKey()Udo Klein2016-01-111-1/+1
| | | | | | | | According to the documentation the sortByKey method does not take a lambda as an argument, thus the example is flawed. Removed the argument completely as this will default to ascending sort. Author: Udo Klein <git@blinkenlight.net> Closes #10640 from udoklein/patch-1.
* [SPARK-12734][HOTFIX][TEST-MAVEN] Fix bug in Netty exclusionsJosh Rosen2016-01-111-4/+0
| | | | | | | | | | This is a hotfix for a build bug introduced by the Netty exclusion changes in #10672. We can't exclude `io.netty:netty` because Akka depends on it. There's not a direct conflict between `io.netty:netty` and `io.netty:netty-all`, because the former puts classes in the `org.jboss.netty` namespace while the latter uses the `io.netty` namespace. However, there still is a conflict between `org.jboss.netty:netty` and `io.netty:netty`, so we need to continue to exclude the JBoss version of that artifact. While the diff here looks somewhat large, note that this is only a revert of a some of the changes from #10672. You can see the net changes in pom.xml at https://github.com/apache/spark/compare/3119206b7188c23055621dfeaf6874f21c711a82...5211ab8#diff-600376dffeb79835ede4a0b285078036 Author: Josh Rosen <joshrosen@databricks.com> Closes #10693 from JoshRosen/netty-hotfix.
* [SPARK-12734][BUILD] Fix Netty exclusion and use Maven Enforcer to prevent ↵Josh Rosen2016-01-101-0/+4
| | | | | | | | | | | | | | | future bugs Netty classes are published under multiple artifacts with different names, so our build needs to exclude the `io.netty:netty` and `org.jboss.netty:netty` versions of the Netty artifact. However, our existing exclusions were incomplete, leading to situations where duplicate Netty classes would wind up on the classpath and cause compile errors (or worse). This patch fixes the exclusion issue by adding more exclusions and uses Maven Enforcer's [banned dependencies](https://maven.apache.org/enforcer/enforcer-rules/bannedDependencies.html) rule to prevent these classes from accidentally being reintroduced. I also updated `dev/test-dependencies.sh` to run `mvn validate` so that the enforcer rules can run as part of pull request builds. /cc rxin srowen pwendell. I'd like to backport at least the exclusion portion of this fix to `branch-1.5` in order to fix the documentation publishing job, which fails nondeterministically due to incompatible versions of Netty classes taking precedence on the compile-time classpath. Author: Josh Rosen <rosenville@gmail.com> Author: Josh Rosen <joshrosen@databricks.com> Closes #10672 from JoshRosen/enforce-netty-exclusions.
* [SPARK-12692][BUILD][MLLIB] Scala style: Fix the style violation (Space ↵Kousuke Saruta2016-01-103-3/+3
| | | | | | | | | | | before "," or ":") Fix the style violation (space before , and :). This PR is a followup for #10643. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #10684 from sarutak/SPARK-12692-followup-mllib.
* [SPARK-4819] Remove Guava's "Optional" from public APISean Owen2016-01-081-12/+8
| | | | | | | | | | Replace Guava `Optional` with (an API clone of) Java 8 `java.util.Optional` (edit: and a clone of Guava `Optional`) See also https://github.com/apache/spark/pull/10512 Author: Sean Owen <sowen@cloudera.com> Closes #10513 from srowen/SPARK-4819.
* fixed numVertices in transitive closure exampleUdo Klein2016-01-081-2/+2
| | | | | | Author: Udo Klein <git@blinkenlight.net> Closes #10642 from udoklein/patch-2.
* [SPARK-12618][CORE][STREAMING][SQL] Clean up build warnings: 2.0.0 editionSean Owen2016-01-087-39/+43
| | | | | | | | Fix most build warnings: mostly deprecated API usages. I'll annotate some of the changes below. CC rxin who is leading the charge to remove the deprecated APIs. Author: Sean Owen <sowen@cloudera.com> Closes #10570 from srowen/SPARK-12618.
* [SPARK-12510][STREAMING] Refactor ActorReceiver to support JavaShixiong Zhu2016-01-072-5/+141
| | | | | | | | | | | | | This PR includes the following changes: 1. Rename `ActorReceiver` to `ActorReceiverSupervisor` 2. Remove `ActorHelper` 3. Add a new `ActorReceiver` for Scala and `JavaActorReceiver` for Java 4. Add `JavaActorWordCount` example Author: Shixiong Zhu <shixiong@databricks.com> Closes #10457 from zsxwing/java-actor-stream.
* [STREAMING][DOCS][EXAMPLES] Minor fixesJacek Laskowski2016-01-071-6/+4
| | | | | | Author: Jacek Laskowski <jacek@japila.pl> Closes #10603 from jaceklaskowski/streaming-actor-custom-receiver.
* [SPARK-12615] Remove some deprecated APIs in RDD/SparkContextReynold Xin2016-01-053-17/+8
| | | | | | | | I looked at each case individually and it looks like they can all be removed. The only one that I had to think twice was toArray (I even thought about un-deprecating it, until I realized it was a problem in Java to have toArray returning java.util.List). Author: Reynold Xin <rxin@databricks.com> Closes #10569 from rxin/SPARK-12615.
* [SPARK-3873][EXAMPLES] Import ordering fixes.Marcelo Vanzin2016-01-04106-154/+147
| | | | | | Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #10575 from vanzin/SPARK-3873-examples.
* [SPARK-12481][CORE][STREAMING][SQL] Remove usage of Hadoop deprecated APIs ↵Sean Owen2016-01-022-6/+2
| | | | | | | | | | and reflection that supported 1.x Remove use of deprecated Hadoop APIs now that 2.2+ is required Author: Sean Owen <sowen@cloudera.com> Closes #10446 from srowen/SPARK-12481.
* [SPARK-12588] Remove HttpBroadcast in Spark 2.0.Reynold Xin2015-12-301-5/+3
| | | | | | | | We switched to TorrentBroadcast in Spark 1.1, and HttpBroadcast has been undocumented since then. It's time to remove it in Spark 2.0. Author: Reynold Xin <rxin@databricks.com> Closes #10531 from rxin/SPARK-12588.
* [SPARK-12429][STREAMING][DOC] Add Accumulator and Broadcast example for ↵Shixiong Zhu2015-12-223-10/+157
| | | | | | | | | | Streaming This PR adds Scala, Java and Python examples to show how to use Accumulator and Broadcast in Spark Streaming to support checkpointing. Author: Shixiong Zhu <shixiong@databricks.com> Closes #10385 from zsxwing/accumulator-broadcast-example.
* [SPARK-11808] Remove Bagel.Reynold Xin2015-12-191-6/+0
| | | | | | Author: Reynold Xin <rxin@databricks.com> Closes #10395 from rxin/SPARK-11808.
* Bump master version to 2.0.0-SNAPSHOT.Reynold Xin2015-12-191-1/+1
| | | | | | Author: Reynold Xin <rxin@databricks.com> Closes #10387 from rxin/version-bump.
* [SPARK-9057][STREAMING] Twitter example joining to static RDD of word ↵Jeff L2015-12-183-0/+353
| | | | | | | | | | sentiment values Example of joining a static RDD of word sentiments to a streaming RDD of Tweets in order to demo the usage of the transform() method. Author: Jeff L <sha0lin@alumni.carnegiemellon.edu> Closes #8431 from Agent007/SPARK-9057.
* [SPARK-12410][STREAMING] Fix places that use '.' and '|' directly in splitShixiong Zhu2015-12-171-1/+1
| | | | | | | | String.split accepts a regular expression, so we should escape "." and "|". Author: Shixiong Zhu <shixiong@databricks.com> Closes #10361 from zsxwing/reg-bug.
* [SPARK-12364][ML][SPARKR] Add ML example for SparkRYanbo Liang2015-12-161-0/+54
| | | | | | | | | | We have DataFrame example for SparkR, we also need to add ML example under ```examples/src/main/r```. cc mengxr jkbradley shivaram Author: Yanbo Liang <ybliang8@gmail.com> Closes #10324 from yanboliang/spark-12364.
* [SPARK-6518][MLLIB][EXAMPLE][DOC] Add example code and user guide for ↵Yu ISHIKAWA2015-12-162-0/+129
| | | | | | | | | | | bisecting k-means This PR includes only an example code in order to finish it quickly. I'll send another PR for the docs soon. Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #9952 from yu-iskw/SPARK-6518.
* [SPARK-12215][ML][DOC] User guide section for KMeans in spark.mlYu ISHIKAWA2015-12-162-28/+29
| | | | | | | | cc jkbradley Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #10244 from yu-iskw/SPARK-12215.
* [MINOR][ML] Rename weights to coefficients for examples/DeveloperApiExampleYanbo Liang2015-12-152-19/+19
| | | | | | | | | | Rename ```weights``` to ```coefficients``` for examples/DeveloperApiExample. cc mengxr jkbradley Author: Yanbo Liang <ybliang8@gmail.com> Closes #10280 from yanboliang/spark-coefficients.
* [SPARK-12199][DOC] Follow-up: Refine example code in ml-features.mdXusen Yin2015-12-123-4/+4
| | | | | | | | | | | | https://issues.apache.org/jira/browse/SPARK-12199 Follow-up PR of SPARK-11551. Fix some errors in ml-features.md mengxr Author: Xusen Yin <yinxusen@gmail.com> Closes #10193 from yinxusen/SPARK-12199.
* [SPARK-11978][ML] Move dataset_example.py to examples/ml and rename to ↵Yanbo Liang2015-12-112-26/+38
| | | | | | | | | | | | | | dataframe_example.py Since ```Dataset``` has a new meaning in Spark 1.6, we should rename it to avoid confusion. #9873 finished the work of Scala example, here we focus on the Python one. Move dataset_example.py to ```examples/ml``` and rename to ```dataframe_example.py```. BTW, fix minor missing issues of #9873. cc mengxr Author: Yanbo Liang <ybliang8@gmail.com> Closes #9957 from yanboliang/SPARK-11978.
* [SPARK-12146][SPARKR] SparkR jsonFile should support multiple input filesYanbo Liang2015-12-111-1/+1
| | | | | | | | | | | | | | | | | * ```jsonFile``` should support multiple input files, such as: ```R jsonFile(sqlContext, c(“path1”, “path2”)) # character vector as arguments jsonFile(sqlContext, “path1,path2”) ``` * Meanwhile, ```jsonFile``` has been deprecated by Spark SQL and will be removed at Spark 2.0. So we mark ```jsonFile``` deprecated and use ```read.json``` at SparkR side. * Replace all ```jsonFile``` with ```read.json``` at test_sparkSQL.R, but still keep jsonFile test case. * If this PR is accepted, we should also make almost the same change for ```parquetFile```. cc felixcheung sun-rui shivaram Author: Yanbo Liang <ybliang8@gmail.com> Closes #10145 from yanboliang/spark-12146.
* [SPARK-11713] [PYSPARK] [STREAMING] Initial RDD updateStateByKey for PySparkBryan Cutler2015-12-101-1/+4
| | | | | | | | Adding ability to define an initial state RDD for use with updateStateByKey PySpark. Added unit test and changed stateful_network_wordcount example to use initial RDD. Author: Bryan Cutler <bjcutler@us.ibm.com> Closes #10082 from BryanCutler/initial-rdd-updateStateByKey-SPARK-11713.