aboutsummaryrefslogtreecommitdiff
path: root/examples/src/main/scala
Commit message (Collapse)AuthorAgeFilesLines
...
* [SPARK-14635][ML] Documentation and Examples for TF-IDF only refer to HashingTFYuhao Yang2016-04-201-0/+2
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? Currently, the docs for TF-IDF only refer to using HashingTF with IDF. However, CountVectorizer can also be used. We should probably amend the user guide and examples to show this. ## How was this patch tested? unit tests and doc generation Author: Yuhao Yang <hhbyyh@gmail.com> Closes #12454 from hhbyyh/tfdoc.
* [MINOR] Revert removing explicit typing (changed in some examples and ↵hyukjinkwon2016-04-182-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | StatFunctions) ## What changes were proposed in this pull request? This PR reverts some changes in https://github.com/apache/spark/pull/12413. (please see the discussion in that PR). from ```scala words.foreachRDD { (rdd, time) => ... ``` to ```scala words.foreachRDD { (rdd: RDD[String], time: Time) => ... ``` Also, this was discussed in dev-mailing list, [here](http://apache-spark-developers-list.1001551.n3.nabble.com/Question-about-Scala-style-explicit-typing-within-transformation-functions-and-anonymous-val-td17173.html) ## How was this patch tested? This was tested with `sbt scalastyle`. Author: hyukjinkwon <gurwls223@gmail.com> Closes #12452 from HyukjinKwon/revert-explicit-typing.
* [SPARK-14299][EXAMPLES] Remove duplications for scala.examples.mlXusen Yin2016-04-184-192/+17
| | | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? https://issues.apache.org/jira/browse/SPARK-14299 Delete duplications in scala/examples/ml. TrainValidationSplitExample.scala --> ModelSelectionViaTrainValidationSplitExample CrossValidatorExample.scala --> ModelSelectionViaCrossValidationExample ## How was this patch tested? Existing tests passed. (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Author: Xusen Yin <yinxusen@gmail.com> Closes #12366 from yinxusen/SPARK-14299-2.
* [MINOR] Remove inappropriate type notation and extra anonymous closure ↵hyukjinkwon2016-04-163-8/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | within functional transformations ## What changes were proposed in this pull request? This PR removes - Inappropriate type notations For example, from ```scala words.foreachRDD { (rdd: RDD[String], time: Time) => ... ``` to ```scala words.foreachRDD { (rdd, time) => ... ``` - Extra anonymous closure within functional transformations. For example, ```scala .map(item => { ... }) ``` which can be just simply as below: ```scala .map { item => ... } ``` and corrects some obvious style nits. ## How was this patch tested? This was tested after adding rules in `scalastyle-config.xml`, which ended up with not finding all perfectly. The rules applied were below: - For the first correction, ```xml <check customId="NoExtraClosure" level="error" class="org.scalastyle.file.RegexChecker" enabled="true"> <parameters><parameter name="regex">(?m)\.[a-zA-Z_][a-zA-Z0-9]*\(\s*[^,]+s*=>\s*\{[^\}]+\}\s*\)</parameter></parameters> </check> ``` ```xml <check customId="NoExtraClosure" level="error" class="org.scalastyle.file.RegexChecker" enabled="true"> <parameters><parameter name="regex">\.[a-zA-Z_][a-zA-Z0-9]*\s*[\{|\(]([^\n>,]+=>)?\s*\{([^()]|(?R))*\}^[,]</parameter></parameters> </check> ``` - For the second correction ```xml <check customId="TypeNotation" level="error" class="org.scalastyle.file.RegexChecker" enabled="true"> <parameters><parameter name="regex">\.[a-zA-Z_][a-zA-Z0-9]*\s*[\{|\(]\s*\([^):]*:R))*\}^[,]</parameter></parameters> </check> ``` **Those rules were not added** Author: hyukjinkwon <gurwls223@gmail.com> Closes #12413 from HyukjinKwon/SPARK-style.
* [MINOR][SQL] Remove extra anonymous closure within functional transformationshyukjinkwon2016-04-142-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? This PR removes extra anonymous closure within functional transformations. For example, ```scala .map(item => { ... }) ``` which can be just simply as below: ```scala .map { item => ... } ``` ## How was this patch tested? Related unit tests and `sbt scalastyle`. Author: hyukjinkwon <gurwls223@gmail.com> Closes #12382 from HyukjinKwon/minor-extra-closers.
* [SPARK-13089][ML] [Doc] spark.ml Naive Bayes user guide and examplesYuhao Yang2016-04-131-0/+58
| | | | | | | | | | jira: https://issues.apache.org/jira/browse/SPARK-13089 Add section in ml-classification.md for NaiveBayes DataFrame-based API, plus example code (using include_example to clip code from examples/ folder files). Author: Yuhao Yang <hhbyyh@gmail.com> Closes #11015 from hhbyyh/naiveBayesDoc.
* [SPARK-14508][BUILD] Add a new ScalaStyle Rule `OmitBracesInCase`Dongjoon Hyun2016-04-125-20/+10
| | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? According to the [Spark Code Style Guide](https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide) and [Scala Style Guide](http://docs.scala-lang.org/style/control-structures.html#curlybraces), we had better enforce the following rule. ``` case: Always omit braces in case clauses. ``` This PR makes a new ScalaStyle rule, 'OmitBracesInCase', and enforces it to the code. ## How was this patch tested? Pass the Jenkins tests (including Scala style checking) Author: Dongjoon Hyun <dongjoon@apache.org> Closes #12280 from dongjoon-hyun/SPARK-14508.
* [SPARK-14500] [ML] Accept Dataset[_] instead of DataFrame in MLlib APIsXiangrui Meng2016-04-111-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? This PR updates MLlib APIs to accept `Dataset[_]` as input where `DataFrame` was the input type. This PR doesn't change the output type. In Java, `Dataset[_]` maps to `Dataset<?>`, which includes `Dataset<Row>`. Some implementations were changed in order to return `DataFrame`. Tests and examples were updated. Note that this is a breaking change for subclasses of Transformer/Estimator. Lol, we don't have to rename the input argument, which has been `dataset` since Spark 1.2. TODOs: - [x] update MiMaExcludes (seems all covered by explicit filters from SPARK-13920) - [x] Python - [x] add a new test to accept Dataset[LabeledPoint] - [x] remove unused imports of Dataset ## How was this patch tested? Exiting unit tests with some modifications. cc: rxin jkbradley Author: Xiangrui Meng <meng@databricks.com> Closes #12274 from mengxr/SPARK-14500.
* Update KMeansExample.scalaÖrjan Lundberg2016-04-101-1/+1
| | | | | | | | | | | | | | | ## What changes were proposed in this pull request? example does not work wo DataFrame import ## How was this patch tested? example doc only example does not work wo DataFrame import Author: Örjan Lundberg <orjan.lundberg@gmail.com> Closes #12277 from oluies/patch-1.
* [SPARK-14444][BUILD] Add a new scalastyle `NoScalaDoc` to prevent ↵Dongjoon Hyun2016-04-061-2/+4
| | | | | | | | | | | | | | | | | | | | | ScalaDoc-style multiline comments ## What changes were proposed in this pull request? According to the [Spark Code Style Guide](https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide#SparkCodeStyleGuide-Indentation), this PR adds a new scalastyle rule to prevent the followings. ``` /** In Spark, we don't use the ScalaDoc style so this * is not correct. */ ``` ## How was this patch tested? Pass the Jenkins tests (including `lint-scala`). Author: Dongjoon Hyun <dongjoon@apache.org> Closes #12221 from dongjoon-hyun/SPARK-14444.
* [MINOR][DOCS] Use multi-line JavaDoc comments in Scala code.Dongjoon Hyun2016-04-028-41/+43
| | | | | | | | | | | | | | | ## What changes were proposed in this pull request? This PR aims to fix all Scala-Style multiline comments into Java-Style multiline comments in Scala codes. (All comment-only changes over 77 files: +786 lines, −747 lines) ## How was this patch tested? Manual. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #12130 from dongjoon-hyun/use_multiine_javadoc_comments.
* [MINOR] Typo fixesJacek Laskowski2016-04-021-1/+1
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? Typo fixes. No functional changes. ## How was this patch tested? Built the sources and ran with samples. Author: Jacek Laskowski <jacek@japila.pl> Closes #11802 from jaceklaskowski/typo-fixes.
* [SPARK-14073][STREAMING][TEST-MAVEN] Move flume back to SparkShixiong Zhu2016-03-252-0/+137
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? This PR moves flume back to Spark as per the discussion in the dev mail-list. ## How was this patch tested? Existing Jenkins tests. Author: Shixiong Zhu <shixiong@databricks.com> Closes #11895 from zsxwing/move-flume-back.
* [SPARK-13017][DOCS] Replace example code in mllib-feature-extraction.md ↵Xin Ren2016-03-247-0/+431
| | | | | | | | | | | | | | | | | | | | | using include_example Replace example code in mllib-feature-extraction.md using include_example https://issues.apache.org/jira/browse/SPARK-13017 The example code in the user guide is embedded in the markdown and hence it is not easy to test. It would be nice to automatically test them. This JIRA is to discuss options to automate example code testing and see what we can do in Spark 1.6. Goal is to move actual example code to spark/examples and test compilation in Jenkins builds. Then in the markdown, we can reference part of the code to show in the user guide. This requires adding a Jekyll tag that is similar to https://github.com/jekyll/jekyll/blob/master/lib/jekyll/tags/include.rb, e.g., called include_example. `{% include_example scala/org/apache/spark/examples/mllib/TFIDFExample.scala %}` Jekyll will find `examples/src/main/scala/org/apache/spark/examples/mllib/TFIDFExample.scala` and pick code blocks marked "example" and replace code block in `{% highlight %}` in the markdown. See more sub-tasks in parent ticket: https://issues.apache.org/jira/browse/SPARK-11337 Author: Xin Ren <iamshrek@126.com> Closes #11142 from keypointt/SPARK-13017.
* [SPARK-13019][DOCS] fix for scala-2.10 build: Replace example code in ↵Xin Ren2016-03-246-0/+356
| | | | | | | | | | | | | | | | | | | | | | mllib-statistics.md using include_example ## What changes were proposed in this pull request? This PR for ticket SPARK-13019 is based on previous PR(https://github.com/apache/spark/pull/11108). Since PR(https://github.com/apache/spark/pull/11108) is breaking scala-2.10 build, more work is needed to fix build errors. What I did new in this PR is adding keyword argument for 'fractions': ` val approxSample = data.sampleByKey(withReplacement = false, fractions = fractions)` ` val exactSample = data.sampleByKeyExact(withReplacement = false, fractions = fractions)` I reopened ticket on JIRA but sorry I don't know how to reopen a GitHub pull request, so I just submitting a new pull request. ## How was this patch tested? Manual build testing on local machine, build based on scala-2.10. Author: Xin Ren <iamshrek@126.com> Closes #11901 from keypointt/SPARK-13019.
* Revert "[SPARK-13019][DOCS] Replace example code in mllib-statistics.md ↵Xiangrui Meng2016-03-216-356/+0
| | | | | | using include_example" This reverts commit 1af8de200c4d3357bcb09e7bbc6deece00e885f2.
* [SPARK-13019][DOCS] Replace example code in mllib-statistics.md using ↵Xin Ren2016-03-216-0/+356
| | | | | | | | | | | | | | | | | | | | include_example https://issues.apache.org/jira/browse/SPARK-13019 The example code in the user guide is embedded in the markdown and hence it is not easy to test. It would be nice to automatically test them. This JIRA is to discuss options to automate example code testing and see what we can do in Spark 1.6. Goal is to move actual example code to spark/examples and test compilation in Jenkins builds. Then in the markdown, we can reference part of the code to show in the user guide. This requires adding a Jekyll tag that is similar to https://github.com/jekyll/jekyll/blob/master/lib/jekyll/tags/include.rb, e.g., called include_example. `{% include_example scala/org/apache/spark/examples/mllib/SummaryStatisticsExample.scala %}` Jekyll will find `examples/src/main/scala/org/apache/spark/examples/mllib/SummaryStatisticsExample.scala` and pick code blocks marked "example" and replace code block in `{% highlight %}` in the markdown. See more sub-tasks in parent ticket: https://issues.apache.org/jira/browse/SPARK-11337 Author: Xin Ren <iamshrek@126.com> Closes #11108 from keypointt/SPARK-13019.
* [SPARK-13928] Move org.apache.spark.Logging into ↵Wenchen Fan2016-03-173-2/+4
| | | | | | | | | | | | | | | | org.apache.spark.internal.Logging ## What changes were proposed in this pull request? Logging was made private in Spark 2.0. If we move it, then users would be able to create a Logging trait themselves to avoid changing their own code. ## How was this patch tested? existing tests. Author: Wenchen Fan <wenchen@databricks.com> Closes #11764 from cloud-fan/logger.
* [SPARK-13843][STREAMING] Remove streaming-flume, streaming-mqtt, ↵Shixiong Zhu2016-03-149-927/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | streaming-zeromq, streaming-akka, streaming-twitter to Spark packages ## What changes were proposed in this pull request? Currently there are a few sub-projects, each for integrating with different external sources for Streaming. Now that we have better ability to include external libraries (spark packages) and with Spark 2.0 coming up, we can move the following projects out of Spark to https://github.com/spark-packages - streaming-flume - streaming-akka - streaming-mqtt - streaming-zeromq - streaming-twitter They are just some ancillary packages and considering the overhead of maintenance, running tests and PR failures, it's better to maintain them out of Spark. In addition, these projects can have their different release cycles and we can release them faster. I have already copied these projects to https://github.com/spark-packages ## How was this patch tested? Jenkins tests Author: Shixiong Zhu <shixiong@databricks.com> Closes #11672 from zsxwing/remove-external-pkg.
* [MINOR][DOCS] Fix more typos in comments/strings.Dongjoon Hyun2016-03-141-1/+1
| | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? This PR fixes 135 typos over 107 files: * 121 typos in comments * 11 typos in testcase name * 3 typos in log messages ## How was this patch tested? Manual. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11689 from dongjoon-hyun/fix_more_typos.
* [SPARK-13823][CORE][STREAMING][SQL] Always specify Charset in String <-> ↵Sean Owen2016-03-131-1/+3
| | | | | | | | | | | | | | | | | | | | byte[] conversions (and remaining Coverity items) ## What changes were proposed in this pull request? - Fixes calls to `new String(byte[])` or `String.getBytes()` that rely on platform default encoding, to use UTF-8 - Same for `InputStreamReader` and `OutputStreamWriter` constructors - Standardizes on UTF-8 everywhere - Standardizes specifying the encoding with `StandardCharsets.UTF-8`, not the Guava constant or "UTF-8" (which means handling `UnuspportedEncodingException`) - (also addresses the other remaining Coverity scan issues, which are pretty trivial; these are separated into commit https://github.com/srowen/spark/commit/1deecd8d9ca986d8adb1a42d315890ce5349d29c ) ## How was this patch tested? Jenkins tests Author: Sean Owen <sowen@cloudera.com> Closes #11657 from srowen/SPARK-13823.
* [SPARK-13512][ML] add example and doc for MaxAbsScalerYuhao Yang2016-03-111-0/+49
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? jira: https://issues.apache.org/jira/browse/SPARK-13512 Add example and doc for ml.feature.MaxAbsScaler. ## How was this patch tested? unit tests Author: Yuhao Yang <hhbyyh@gmail.com> Closes #11392 from hhbyyh/maxabsdoc.
* [SPARK-3854][BUILD] Scala style: require spaces before `{`.Dongjoon Hyun2016-03-106-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Since the opening curly brace, '{', has many usages as discussed in [SPARK-3854](https://issues.apache.org/jira/browse/SPARK-3854), this PR adds a ScalaStyle rule to prevent '){' pattern for the following majority pattern and fixes the code accordingly. If we enforce this in ScalaStyle from now, it will improve the Scala code quality and reduce review time. ``` // Correct: if (true) { println("Wow!") } // Incorrect: if (true){ println("Wow!") } ``` IntelliJ also shows new warnings based on this. ## How was this patch tested? Pass the Jenkins ScalaStyle test. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11637 from dongjoon-hyun/SPARK-3854.
* Fixing the type of the sentiment happiness valueYury Liavitski2016-03-071-2/+2
| | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Added the conversion to int for the 'happiness value' read from the file. Otherwise, later on line 75 the multiplication will multiply a string by a number, yielding values like "-2-2" instead of -4. ## How was this patch tested? Tested manually. Author: Yury Liavitski <seconds.before@gmail.com> Author: Yury Liavitski <yury.liavitski@il111.ice.local> Closes #11540 from heliocentrist/fix-sentiment-value-type.
* [MINOR] Fix typos in comments and testcase name of codeDongjoon Hyun2016-03-035-6/+6
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? This PR fixes typos in comments and testcase name of code. ## How was this patch tested? manual. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11481 from dongjoon-hyun/minor_fix_typos_in_code.
* [SPARK-13013][DOCS] Replace example code in mllib-clustering.md using ↵Xin Ren2016-03-035-3/+186
| | | | | | | | | | | | | | | | | | | | | include_example Replace example code in mllib-clustering.md using include_example https://issues.apache.org/jira/browse/SPARK-13013 The example code in the user guide is embedded in the markdown and hence it is not easy to test. It would be nice to automatically test them. This JIRA is to discuss options to automate example code testing and see what we can do in Spark 1.6. Goal is to move actual example code to spark/examples and test compilation in Jenkins builds. Then in the markdown, we can reference part of the code to show in the user guide. This requires adding a Jekyll tag that is similar to https://github.com/jekyll/jekyll/blob/master/lib/jekyll/tags/include.rb, e.g., called include_example. `{% include_example scala/org/apache/spark/examples/mllib/KMeansExample.scala %}` Jekyll will find `examples/src/main/scala/org/apache/spark/examples/mllib/KMeansExample.scala` and pick code blocks marked "example" and replace code block in `{% highlight %}` in the markdown. See more sub-tasks in parent ticket: https://issues.apache.org/jira/browse/SPARK-11337 Author: Xin Ren <iamshrek@126.com> Closes #11116 from keypointt/SPARK-13013.
* [SPARK-13583][CORE][STREAMING] Remove unused imports and add checkstyle ruleDongjoon Hyun2016-03-0323-31/+3
| | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? After SPARK-6990, `dev/lint-java` keeps Java code healthy and helps PR review by saving much time. This issue aims remove unused imports from Java/Scala code and add `UnusedImports` checkstyle rule to help developers. ## How was this patch tested? ``` ./dev/lint-java ./build/sbt compile ``` Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11438 from dongjoon-hyun/SPARK-13583.
* [SPARK-13423][WIP][CORE][SQL][STREAMING] Static analysis fixes for 2.xSean Owen2016-03-037-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Make some cross-cutting code improvements according to static analysis. These are individually up for discussion since they exist in separate commits that can be reverted. The changes are broadly: - Inner class should be static - Mismatched hashCode/equals - Overflow in compareTo - Unchecked warnings - Misuse of assert, vs junit.assert - get(a) + getOrElse(b) -> getOrElse(a,b) - Array/String .size -> .length (occasionally, -> .isEmpty / .nonEmpty) to avoid implicit conversions - Dead code - tailrec - exists(_ == ) -> contains find + nonEmpty -> exists filter + size -> count - reduce(_+_) -> sum map + flatten -> map The most controversial may be .size -> .length simply because of its size. It is intended to avoid implicits that might be expensive in some places. ## How was the this patch tested? Existing Jenkins unit tests. Author: Sean Owen <sowen@cloudera.com> Closes #11292 from srowen/SPARK-13423.
* [HOT-FIX] Recover some deprecations for 2.10 compatibility.Dongjoon Hyun2016-03-031-1/+1
| | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? #11479 [SPARK-13627] broke 2.10 compatibility: [2.10-Build](https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-scala-2.10/292/console) At this moment, we need to support both 2.10 and 2.11. This PR recovers some deprecated methods which were replace by [SPARK-13627]. ## How was this patch tested? Jenkins build: Both 2.10, 2.11. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11488 from dongjoon-hyun/hotfix_compatibility_with_2.10.
* [SPARK-13627][SQL][YARN] Fix simple deprecation warnings.Dongjoon Hyun2016-03-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? This PR aims to fix the following deprecation warnings. * MethodSymbolApi.paramss--> paramLists * AnnotationApi.tpe -> tree.tpe * BufferLike.readOnly -> toList. * StandardNames.nme -> termNames * scala.tools.nsc.interpreter.AbstractFileClassLoader -> scala.reflect.internal.util.AbstractFileClassLoader * TypeApi.declarations-> decls ## How was this patch tested? Check the compile build log and pass the tests. ``` ./build/sbt ``` Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11479 from dongjoon-hyun/SPARK-13627.
* [MINOR][STREAMING] Replace deprecated `apply` with `create` in example.Dongjoon Hyun2016-03-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Twitter Algebird deprecated `apply` in HyperLogLog.scala. ``` deprecated("Use toHLL", since = "0.10.0 / 2015-05") def apply[T <% Array[Byte]](t: T) = create(t) ``` This PR replace the deprecated usage `apply` with new `create` according to the upstream change. ## How was this patch tested? manual. ``` /bin/spark-submit --class org.apache.spark.examples.streaming.TwitterAlgebirdHLL examples/target/scala-2.11/spark-examples-2.0.0-SNAPSHOT-hadoop2.2.0.jar ``` Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11451 from dongjoon-hyun/replace_deprecated_hll_apply.
* [SPARK-11381][DOCS] Replace example code in mllib-linear-methods.md using ↵Dongjoon Hyun2016-02-264-0/+261
| | | | | | | | | | | | | | | | | | | | | include_example ## What changes were proposed in this pull request? This PR replaces example codes in `mllib-linear-methods.md` using `include_example` by doing the followings: * Extracts the example codes(Scala,Java,Python) as files in `example` module. * Merges some dialog-style examples into a single file. * Hide redundant codes in HTML for the consistency with other docs. ## How was the this patch tested? manual test. This PR can be tested by document generations, `SKIP_API=1 jekyll build`. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11320 from dongjoon-hyun/SPARK-11381.
* [SPARK-13457][SQL] Removes DataFrame RDD operationsCheng Lian2016-02-276-8/+9
| | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? This is another try of PR #11323. This PR removes DataFrame RDD operations except for `foreach` and `foreachPartitions` (they are actions rather than transformations). Original calls are now replaced by calls to methods of `DataFrame.rdd`. PR #11323 was reverted because it introduced a regression: both `DataFrame.foreach` and `DataFrame.foreachPartitions` wrap underlying RDD operations with `withNewExecutionId` to track Spark jobs. But they are removed in #11323. ## How was the this patch tested? No extra tests are added. Existing tests should do the work. Author: Cheng Lian <lian@databricks.com> Closes #11388 from liancheng/remove-df-rdd-ops.
* Revert "[SPARK-13457][SQL] Removes DataFrame RDD operations"Davies Liu2016-02-256-9/+8
| | | | This reverts commit 157fe64f3ecbd13b7286560286e50235eecfe30e.
* [SPARK-13457][SQL] Removes DataFrame RDD operationsCheng Lian2016-02-256-8/+9
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? This PR removes DataFrame RDD operations. Original calls are now replaced by calls to methods of `DataFrame.rdd`. ## How was the this patch tested? No extra tests are added. Existing tests should do the work. Author: Cheng Lian <lian@databricks.com> Closes #11323 from liancheng/remove-df-rdd-ops.
* [SPARK-13012][DOCUMENTATION] Replace example code in ml-guide.md using ↵Devaraj K2016-02-224-0/+376
| | | | | | | | | | include_example Replaced example code in ml-guide.md using include_example Author: Devaraj K <devaraj@apache.org> Closes #11053 from devaraj-kavali/SPARK-13012.
* [SPARK-13016][DOCUMENTATION] Replace example code in ↵Devaraj K2016-02-223-0/+176
| | | | | | | | | | | mllib-dimensionality-reduction.md using include_example Replaced example example code in mllib-dimensionality-reduction.md using include_example Author: Devaraj K <devaraj@apache.org> Closes #11132 from devaraj-kavali/SPARK-13016.
* [SPARK-12953][EXAMPLES] RDDRelation writer set overwrite modeshijinkui2016-02-171-4/+3
| | | | | | | | | | | | | https://issues.apache.org/jira/browse/SPARK-12953 fix error when run RDDRelation.main(): "path file:/Users/sjk/pair.parquet already exists" Set DataFrameWriter's mode to SaveMode.Overwrite Author: shijinkui <shijinkui666@163.com> Closes #10864 from shijinkui/set_mode.
* [SPARK-12247][ML][DOC] Documentation for spark.ml's ALS and collaborative ↵BenFradet2016-02-162-182/+82
| | | | | | | | | | filtering in general This documents the implementation of ALS in `spark.ml` with example code in scala, java and python. Author: BenFradet <benjamin.fradet@gmail.com> Closes #10411 from BenFradet/SPARK-12247.
* [SPARK-13018][DOCS] Replace example code in mllib-pmml-model-export.md using ↵Xin Ren2016-02-151-0/+59
| | | | | | | | | | | | | | | | | | | | | include_example Replace example code in mllib-pmml-model-export.md using include_example https://issues.apache.org/jira/browse/SPARK-13018 The example code in the user guide is embedded in the markdown and hence it is not easy to test. It would be nice to automatically test them. This JIRA is to discuss options to automate example code testing and see what we can do in Spark 1.6. Goal is to move actual example code to spark/examples and test compilation in Jenkins builds. Then in the markdown, we can reference part of the code to show in the user guide. This requires adding a Jekyll tag that is similar to https://github.com/jekyll/jekyll/blob/master/lib/jekyll/tags/include.rb, e.g., called include_example. `{% include_example scala/org/apache/spark/examples/mllib/PMMLModelExportExample.scala %}` Jekyll will find `examples/src/main/scala/org/apache/spark/examples/mllib/PMMLModelExportExample.scala` and pick code blocks marked "example" and replace code block in `{% highlight %}` in the markdown. See more sub-tasks in parent ticket: https://issues.apache.org/jira/browse/SPARK-11337 Author: Xin Ren <iamshrek@126.com> Closes #11126 from keypointt/SPARK-13018.
* [SPARK-12363][MLLIB] Remove setRun and fix PowerIterationClustering failed testLiang-Chi Hsieh2016-02-131-32/+21
| | | | | | | | | | | JIRA: https://issues.apache.org/jira/browse/SPARK-12363 This issue is pointed by yanboliang. When `setRuns` is removed from PowerIterationClustering, one of the tests will be failed. I found that some `dstAttr`s of the normalized graph are not correct values but 0.0. By setting `TripletFields.All` in `mapTriplets` it can work. Author: Liang-Chi Hsieh <viirya@gmail.com> Author: Xiangrui Meng <meng@databricks.com> Closes #10539 from viirya/fix-poweriter.
* [SPARK-13170][STREAMING] Investigate replacing SynchronizedQueue as it is ↵Sean Owen2016-02-091-3/+5
| | | | | | | | | | deprecated Replace SynchronizeQueue with synchronized access to a Queue Author: Sean Owen <sowen@cloudera.com> Closes #11111 from srowen/SPARK-13170.
* [SPARK-13177][EXAMPLES] Update ActorWordCount example to not directly use ↵sachin aggarwal2016-02-091-4/+4
| | | | | | | | low level linked list as it is deprecated. Author: sachin aggarwal <different.sachin@gmail.com> Closes #11113 from agsachin/master.
* [SPARK-7799][SPARK-12786][STREAMING] Add "streaming-akka" projectShixiong Zhu2016-01-202-20/+30
| | | | | | | | | | | | | Include the following changes: 1. Add "streaming-akka" project and org.apache.spark.streaming.akka.AkkaUtils for creating an actorStream 2. Remove "StreamingContext.actorStream" and "JavaStreamingContext.actorStream" 3. Update the ActorWordCount example and add the JavaActorWordCount example 4. Make "streaming-zeromq" depend on "streaming-akka" and update the codes accordingly Author: Shixiong Zhu <shixiong@databricks.com> Closes #10744 from zsxwing/streaming-akka-2.
* [SPARK-12667] Remove block manager's internal "external block store" APIReynold Xin2016-01-152-143/+0
| | | | | | | | | | This pull request removes the external block store API. This is rarely used, and the file system interface is actually a better, more standard way to interact with external storage systems. There are some other things to remove also, as pointed out by JoshRosen. We will do those as follow-up pull requests. Author: Reynold Xin <rxin@databricks.com> Closes #10752 from rxin/remove-offheap.
* [SPARK-12692][BUILD][STREAMING] Scala style: Fix the style violation (Space ↵Kousuke Saruta2016-01-111-7/+7
| | | | | | | | | | | before "," or ":") Fix the style violation (space before , and :). This PR is a followup for #10643. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #10685 from sarutak/SPARK-12692-followup-streaming.
* [SPARK-12603][MLLIB] PySpark MLlib GaussianMixtureModel should support ↵Yanbo Liang2016-01-111-0/+6
| | | | | | | | | | single instance predict/predictSoft PySpark MLlib ```GaussianMixtureModel``` should support single instance ```predict/predictSoft``` just like Scala do. Author: Yanbo Liang <ybliang8@gmail.com> Closes #10552 from yanboliang/spark-12603.
* [SPARK-12692][BUILD][MLLIB] Scala style: Fix the style violation (Space ↵Kousuke Saruta2016-01-103-3/+3
| | | | | | | | | | | before "," or ":") Fix the style violation (space before , and :). This PR is a followup for #10643. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #10684 from sarutak/SPARK-12692-followup-mllib.
* [SPARK-12618][CORE][STREAMING][SQL] Clean up build warnings: 2.0.0 editionSean Owen2016-01-082-2/+2
| | | | | | | | Fix most build warnings: mostly deprecated API usages. I'll annotate some of the changes below. CC rxin who is leading the charge to remove the deprecated APIs. Author: Sean Owen <sowen@cloudera.com> Closes #10570 from srowen/SPARK-12618.
* [SPARK-12510][STREAMING] Refactor ActorReceiver to support JavaShixiong Zhu2016-01-071-5/+4
| | | | | | | | | | | | | This PR includes the following changes: 1. Rename `ActorReceiver` to `ActorReceiverSupervisor` 2. Remove `ActorHelper` 3. Add a new `ActorReceiver` for Scala and `JavaActorReceiver` for Java 4. Add `JavaActorWordCount` example Author: Shixiong Zhu <shixiong@databricks.com> Closes #10457 from zsxwing/java-actor-stream.