aboutsummaryrefslogtreecommitdiff
path: root/graphx/src/main/scala/org
Commit message (Collapse)AuthorAgeFilesLines
* [SPARK-14868][BUILD] Enable NewLineAtEofChecker in checkstyle and fix ↵Dongjoon Hyun2016-04-242-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | lint-java errors ## What changes were proposed in this pull request? Spark uses `NewLineAtEofChecker` rule in Scala by ScalaStyle. And, most Java code also comply with the rule. This PR aims to enforce the same rule `NewlineAtEndOfFile` by CheckStyle explicitly. Also, this fixes lint-java errors since SPARK-14465. The followings are the items. - Adds a new line at the end of the files (19 files) - Fixes 25 lint-java errors (12 RedundantModifier, 6 **ArrayTypeStyle**, 2 LineLength, 2 UnusedImports, 2 RegexpSingleline, 1 ModifierOrder) ## How was this patch tested? After the Jenkins test succeeds, `dev/lint-java` should pass. (Currently, Jenkins dose not run lint-java.) ```bash $ dev/lint-java Using `mvn` from path: /usr/local/bin/mvn Checkstyle checks passed. ``` Author: Dongjoon Hyun <dongjoon@apache.org> Closes #12632 from dongjoon-hyun/SPARK-14868.
* Added omitted word in error messageVictor Chima2016-04-061-1/+1
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? Added an omitted word in the error message displayed by the Graphx Pregel API when `maxIterations <= 0` ## How was this patch tested? Manual test Author: Victor Chima <blazy2k9@gmail.com> Closes #12205 from blazy2k9/hotfix/pregel-error-message.
* [MINOR][DOCS] Use multi-line JavaDoc comments in Scala code.Dongjoon Hyun2016-04-022-14/+14
| | | | | | | | | | | | | | | ## What changes were proposed in this pull request? This PR aims to fix all Scala-Style multiline comments into Java-Style multiline comments in Scala codes. (All comment-only changes over 77 files: +786 lines, −747 lines) ## How was this patch tested? Manual. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #12130 from dongjoon-hyun/use_multiine_javadoc_comments.
* [SPARK-14219][GRAPHX] Fix `pickRandomVertex` not to fall into infinite loops ↵Dongjoon Hyun2016-03-281-1/+1
| | | | | | | | | | | | | | | | | | | | for graphs with one vertex ## What changes were proposed in this pull request? Currently, `GraphOps.pickRandomVertex()` falls into infinite loops for graphs having only one vertex. This PR fixes it by modifying the following termination-checking condition. ```scala - if (selectedVertices.count > 1) { + if (selectedVertices.count > 0) { ``` ## How was this patch tested? Pass the Jenkins tests (including new test case). Author: Dongjoon Hyun <dongjoon@apache.org> Closes #12018 from dongjoon-hyun/SPARK-14219.
* [MINOR] Fix newly added java-lint errorsDongjoon Hyun2016-03-261-3/+1
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? This PR fixes some newly added java-lint errors(unused-imports, line-lengsth). ## How was this patch tested? Pass the Jenkins tests. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11968 from dongjoon-hyun/SPARK-14167.
* [SPARK-13928] Move org.apache.spark.Logging into ↵Wenchen Fan2016-03-175-4/+6
| | | | | | | | | | | | | | | | org.apache.spark.internal.Logging ## What changes were proposed in this pull request? Logging was made private in Spark 2.0. If we move it, then users would be able to create a Logging trait themselves to avoid changing their own code. ## How was this patch tested? existing tests. Author: Wenchen Fan <wenchen@databricks.com> Closes #11764 from cloud-fan/logger.
* [SPARK-13816][GRAPHX] Add parameter checks for algorithms in GraphxZheng RuiFeng2016-03-166-2/+25
| | | | | | | | | | | | | | | | JIRA: https://issues.apache.org/jira/browse/SPARK-13816 ## What changes were proposed in this pull request? Add parameter checks for algorithms in Graphx: Pregel,LabelPropagation,PageRank,SVDPlusPlus ## How was this patch tested? manual tests Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #11655 from zhengruifeng/graphx_param_check.
* [MINOR][DOCS] Fix more typos in comments/strings.Dongjoon Hyun2016-03-146-8/+8
| | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? This PR fixes 135 typos over 107 files: * 121 typos in comments * 11 typos in testcase name * 3 typos in log messages ## How was this patch tested? Manual. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11689 from dongjoon-hyun/fix_more_typos.
* [SPARK-13583][CORE][STREAMING] Remove unused imports and add checkstyle ruleDongjoon Hyun2016-03-039-22/+2
| | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? After SPARK-6990, `dev/lint-java` keeps Java code healthy and helps PR review by saving much time. This issue aims remove unused imports from Java/Scala code and add `UnusedImports` checkstyle rule to help developers. ## How was this patch tested? ``` ./dev/lint-java ./build/sbt compile ``` Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11438 from dongjoon-hyun/SPARK-13583.
* [SPARK-13423][WIP][CORE][SQL][STREAMING] Static analysis fixes for 2.xSean Owen2016-03-039-24/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Make some cross-cutting code improvements according to static analysis. These are individually up for discussion since they exist in separate commits that can be reverted. The changes are broadly: - Inner class should be static - Mismatched hashCode/equals - Overflow in compareTo - Unchecked warnings - Misuse of assert, vs junit.assert - get(a) + getOrElse(b) -> getOrElse(a,b) - Array/String .size -> .length (occasionally, -> .isEmpty / .nonEmpty) to avoid implicit conversions - Dead code - tailrec - exists(_ == ) -> contains find + nonEmpty -> exists filter + size -> count - reduce(_+_) -> sum map + flatten -> map The most controversial may be .size -> .length simply because of its size. It is intended to avoid implicits that might be expensive in some places. ## How was the this patch tested? Existing Jenkins unit tests. Author: Sean Owen <sowen@cloudera.com> Closes #11292 from srowen/SPARK-13423.
* [MINOR][DOCS] Fix all typos in markdown files of `doc` and similar patterns ↵Dongjoon Hyun2016-02-222-2/+2
| | | | | | | | | | | | | | | | | in other comments ## What changes were proposed in this pull request? This PR tries to fix all typos in all markdown files under `docs` module, and fixes similar typos in other comments, too. ## How was the this patch tested? manual tests. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11300 from dongjoon-hyun/minor_fix_typos.
* [SPARK-3650][GRAPHX] Triangle Count handles reverse edges incorrectlyRobin East2016-02-212-20/+57
| | | | | | | | | | | | | | | | | jegonzal ankurdave please could you review ## What changes were proposed in this pull request? Reworking of jegonzal PR #2495 to address the issue identified in SPARK-3650. Code amended to use the convertToCanonicalEdges method. ## How was the this patch tested? Patch was tested using the unit tests created in PR #2495 Author: Robin East <robin.east@xense.co.uk> Author: Joseph E. Gonzalez <joseph.e.gonzalez@gmail.com> Closes #11290 from insidedctm/spark-3650.
* [SPARK-13416][GraphX] Add positive check for option 'numIter' in ↵Zheng RuiFeng2016-02-211-1/+1
| | | | | | | | | | | | | | | | | | StronglyConnectedComponents JIRA: https://issues.apache.org/jira/browse/SPARK-13416 ## What changes were proposed in this pull request? The output of StronglyConnectedComponents with numIter no greater than 1 may make no sense. So I just add require check in it. ## How was the this patch tested? unit tests passed Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #11284 from zhengruifeng/scccheck.
* [SPARK-13386][GRAPHX] ConnectedComponents should support maxIteration optionZheng RuiFeng2016-02-202-6/+32
| | | | | | | | | | | | | | | | JIRA: https://issues.apache.org/jira/browse/SPARK-13386 ## What changes were proposed in this pull request? add maxIteration option for ConnectedComponents algorithm ## How was the this patch tested? unit tests passed Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #11268 from zhengruifeng/ccwithmax.
* [SPARK-12995][GRAPHX] Remove deprecate APIs from PregelTakeshi YAMAMURO2016-02-155-88/+30
| | | | | | Author: Takeshi YAMAMURO <linguin.m.s@gmail.com> Closes #10918 from maropu/RemoveDeprecateInPregel.
* [SPARK-12655][GRAPHX] GraphX does not unpersist RDDsJason Lee2016-01-152-2/+4
| | | | | | | | | | Some VertexRDD and EdgeRDD are created during the intermediate step of g.connectedComponents() but unnecessarily left cached after the method is done. The fix is to unpersist these RDDs once they are no longer in use. A test case is added to confirm the fix for the reported bug. Author: Jason Lee <cjlee@us.ibm.com> Closes #10713 from jasoncl/SPARK-12655.
* [SPARK-12692][BUILD][GRAPHX] Scala style: Fix the style violation (Space ↵Kousuke Saruta2016-01-105-9/+8
| | | | | | | | | | | before "," or ":") Fix the style violation (space before `,` and `:`). This PR is a followup for #10643. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #10683 from sarutak/SPARK-12692-followup-graphx.
* [SPARK-12665][CORE][GRAPHX] Remove Vector, VectorSuite and ↵Kousuke Saruta2016-01-061-49/+0
| | | | | | | | | | GraphKryoRegistrator which are deprecated and no longer used Whole code of Vector.scala, VectorSuite.scala and GraphKryoRegistrator.scala are no longer used so it's time to remove them in Spark 2.0. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #10613 from sarutak/SPARK-12665.
* [SPARK-3873][GRAPHX] Import order fixes.Marcelo Vanzin2015-12-3023-50/+33
| | | | | | | | There's one warning left, caused by a bug in the checker. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #10537 from vanzin/SPARK-3873-graphx.
* Fixed error in scaladoc of convertToCanonicalEdgesGaurav Kumar2015-11-121-1/+1
| | | | | | | | The code convertToCanonicalEdges is such that srcIds are smaller than dstIds but the scaladoc suggested otherwise. Have fixed the same. Author: Gaurav Kumar <gauravkumar37@gmail.com> Closes #9666 from gauravkumar37/patch-1.
* [SPARK-6152] Use shaded ASM5 to support closure cleaning of Java 8 compiled ↵Josh Rosen2015-11-111-8/+8
| | | | | | | | | | | | | | | | classes This patch modifies Spark's closure cleaner (and a few other places) to use ASM 5, which is necessary in order to support cleaning of closures that were compiled by Java 8. In order to avoid ASM dependency conflicts, Spark excludes ASM from all of its dependencies and uses a shaded version of ASM 4 that comes from `reflectasm` (see [SPARK-782](https://issues.apache.org/jira/browse/SPARK-782) and #232). This patch updates Spark to use a shaded version of ASM 5.0.4 that was published by the Apache XBean project; the POM used to create the shaded artifact can be found at https://github.com/apache/geronimo-xbean/blob/xbean-4.4/xbean-asm5-shaded/pom.xml. http://movingfulcrum.tumblr.com/post/80826553604/asm-framework-50-the-missing-migration-guide was a useful resource while upgrading the code to use the new ASM5 opcodes. I also added a new regression tests in the `java8-tests` subproject; the existing tests were insufficient to catch this bug, which only affected Scala 2.11 user code which was compiled targeting Java 8. Author: Josh Rosen <joshrosen@databricks.com> Closes #9512 from JoshRosen/SPARK-6152.
* [SPARK-11432][GRAPHX] Personalized PageRank shouldn't use uniform initializationYves Raimond2015-11-021-11/+18
| | | | | | | | Changes the personalized pagerank initialization to be non-uniform. Author: Yves Raimond <yraimond@netflix.com> Closes #9386 from moustaki/personalized-pagerank-init.
* [SPARK-10598] [DOCS]Robin East2015-09-141-1/+1
| | | | | | | | | | | Comments preceding toMessage method state: "The edge partition is encoded in the lower * 30 bytes of the Int, and the position is encoded in the upper 2 bytes of the Int.". References to bytes should be changed to bits. This contribution is my original work and I license the work to the Spark project under it's open source license. Author: Robin East <robin.east@xense.co.uk> Closes #8756 from insidedctm/master.
* [SPARK-10576] [BUILD] Move .java files out of src/main/scalaSean Owen2015-09-142-106/+0
| | | | | | | | Move .java files in `src/main/scala` to `src/main/java` root, except for `package-info.java` (to stay next to package.scala) Author: Sean Owen <sowen@cloudera.com> Closes #8736 from srowen/SPARK-10576.
* [SPARK-10227] fatal warnings with sbt on Scala 2.11Luc Bourlier2015-09-093-7/+7
| | | | | | | | | | | The bulk of the changes are on `transient` annotation on class parameter. Often the compiler doesn't generate a field for this parameters, so the the transient annotation would be unnecessary. But if the class parameter are used in methods, then fields are created. So it is safer to keep the annotations. The remainder are some potential bugs, and deprecated syntax. Author: Luc Bourlier <luc.bourlier@typesafe.com> Closes #8433 from skyluc/issue/sbt-2.11.
* [SPARK-9960] [GRAPHX] sendMessage type fix in LabelPropagation.scalazc he2015-08-141-1/+1
| | | | | | Author: zc he <farseer90718@gmail.com> Closes #8188 from farseer90718/farseer-patch-1.
* [SPARK-3190] [GRAPHX] Fix VertexRDD.count() overflow regressionAnkur Dave2015-08-031-1/+1
| | | | | | | | | | SPARK-3190 was originally fixed by 96df92906978c5f58e0cc8ff5eebe5b35a08be3b, but a5ef58113667ff73562ce6db381cff96a0b354b0 introduced a regression during refactoring. This commit fixes the regression. Author: Ankur Dave <ankurdave@gmail.com> Closes #7923 from ankurdave/SPARK-3190-reopening and squashes the following commits: a3e1b23 [Ankur Dave] Fix VertexRDD.count() overflow regression
* [SPARK-9436] [GRAPHX] Pregel simplification patchAlexander Ulanov2015-07-291-13/+10
| | | | | | | | | | | | | | | | | Pregel code contains two consecutive joins: ``` g.vertices.innerJoin(messages)(vprog) ... g = g.outerJoinVertices(newVerts) { (vid, old, newOpt) => newOpt.getOrElse(old) } ``` This can be simplified with one join. ankurdave proposed a patch based on our discussion in the mailing list: https://www.mail-archive.com/devspark.apache.org/msg10316.html Author: Alexander Ulanov <nashb@yandex.ru> Closes #7749 from avulanov/SPARK-9436-pregel and squashes the following commits: 8568e06 [Alexander Ulanov] Pregel simplification patch
* [SPARK-9109] [GRAPHX] Keep the cached edge in the graphtien-dungle2015-07-171-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The change here is to keep the cached RDDs in the graph object so that when the graph.unpersist() is called these RDDs are correctly unpersisted. ```java import org.apache.spark.graphx._ import org.apache.spark.rdd.RDD import org.slf4j.LoggerFactory import org.apache.spark.graphx.util.GraphGenerators // Create an RDD for the vertices val users: RDD[(VertexId, (String, String))] = sc.parallelize(Array((3L, ("rxin", "student")), (7L, ("jgonzal", "postdoc")), (5L, ("franklin", "prof")), (2L, ("istoica", "prof")))) // Create an RDD for edges val relationships: RDD[Edge[String]] = sc.parallelize(Array(Edge(3L, 7L, "collab"), Edge(5L, 3L, "advisor"), Edge(2L, 5L, "colleague"), Edge(5L, 7L, "pi"))) // Define a default user in case there are relationship with missing user val defaultUser = ("John Doe", "Missing") // Build the initial Graph val graph = Graph(users, relationships, defaultUser) graph.cache().numEdges graph.unpersist() sc.getPersistentRDDs.foreach( r => println( r._2.toString)) ``` Author: tien-dungle <tien-dung.le@realimpactanalytics.com> Closes #7469 from tien-dungle/SPARK-9109_Graphx-unpersist and squashes the following commits: 8d87997 [tien-dungle] Keep the cached edge in the graph
* [SPARK-8962] Add Scalastyle rule to ban direct use of Class.forName; fix ↵Josh Rosen2015-07-141-1/+1
| | | | | | | | | | | | | | | | | | | | | existing uses This pull request adds a Scalastyle regex rule which fails the style check if `Class.forName` is used directly. `Class.forName` always loads classes from the default / system classloader, but in a majority of cases, we should be using Spark's own `Utils.classForName` instead, which tries to load classes from the current thread's context classloader and falls back to the classloader which loaded Spark when the context classloader is not defined. <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/7350) <!-- Reviewable:end --> Author: Josh Rosen <joshrosen@databricks.com> Closes #7350 from JoshRosen/ban-Class.forName and squashes the following commits: e3e96f7 [Josh Rosen] Merge remote-tracking branch 'origin/master' into ban-Class.forName c0b7885 [Josh Rosen] Hopefully fix the last two cases d707ba7 [Josh Rosen] Fix uses of Class.forName that I missed in my first cleanup pass 046470d [Josh Rosen] Merge remote-tracking branch 'origin/master' into ban-Class.forName 62882ee [Josh Rosen] Fix uses of Class.forName or add exclusion. d9abade [Josh Rosen] Add stylechecker rule to ban uses of Class.forName
* [SPARK-8718] [GRAPHX] Improve EdgePartition2D for non perfect square number ↵Andrew Ray2015-07-141-11/+21
| | | | | | | | | | | | | | | | | | | | of partitions See https://github.com/aray/e2d/blob/master/EdgePartition2D.ipynb Author: Andrew Ray <ray.andrew@gmail.com> Closes #7104 from aray/edge-partition-2d-improvement and squashes the following commits: 3729f84 [Andrew Ray] correct bounds and remove unneeded comments 97f8464 [Andrew Ray] change less 5141ab4 [Andrew Ray] Merge branch 'master' into edge-partition-2d-improvement 925fd2c [Andrew Ray] use new interface for partitioning 001bfd0 [Andrew Ray] Refactor PartitionStrategy so that we can return a prtition function for a given number of parts. To keep compatibility we define default methods that translate between the two implementation options. Made EdgePartition2D use old strategy when we have a perfect square and implement new interface. 5d42105 [Andrew Ray] % -> / 3560084 [Andrew Ray] Merge branch 'master' into edge-partition-2d-improvement f006364 [Andrew Ray] remove unneeded comments cfa2c5e [Andrew Ray] Modifications to EdgePartition2D so that it works for non perfect squares.
* [SPARK-7977] [BUILD] Disallowing printlnJonathan Alter2015-07-102-3/+2
| | | | | | | | | | | | | | | | | | | | | | | Author: Jonathan Alter <jonalter@users.noreply.github.com> Closes #7093 from jonalter/SPARK-7977 and squashes the following commits: ccd44cc [Jonathan Alter] Changed println to log in ThreadingSuite 7fcac3e [Jonathan Alter] Reverting to println in ThreadingSuite 10724b6 [Jonathan Alter] Changing some printlns to logs in tests eeec1e7 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977 0b1dcb4 [Jonathan Alter] More println cleanup aedaf80 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977 925fd98 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977 0c16fa3 [Jonathan Alter] Replacing some printlns with logs 45c7e05 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977 5c8e283 [Jonathan Alter] Allowing println in audit-release examples 5b50da1 [Jonathan Alter] Allowing printlns in example files ca4b477 [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977 83ab635 [Jonathan Alter] Fixing new printlns 54b131f [Jonathan Alter] Merge branch 'master' of github.com:apache/spark into SPARK-7977 1cd8a81 [Jonathan Alter] Removing some unnecessary comments and printlns b837c3a [Jonathan Alter] Disallowing println
* [SPARK-7979] Enforce structural type checker.Reynold Xin2015-05-311-1/+3
| | | | | | | | | | Author: Reynold Xin <rxin@databricks.com> Closes #6536 from rxin/structural-type-checker and squashes the following commits: f833151 [Reynold Xin] Fixed compilation. 633f9a1 [Reynold Xin] Fixed typo. d1fa804 [Reynold Xin] [SPARK-7979] Enforce structural type checker.
* [SPARK-7927] whitespace fixes for GraphX.Reynold Xin2015-05-2810-26/+27
| | | | | | | | | | | So we can enable a whitespace enforcement rule in the style checker to save code review time. Author: Reynold Xin <rxin@databricks.com> Closes #6474 from rxin/whitespace-graphx and squashes the following commits: 4d3cd26 [Reynold Xin] Fixed tests. 869dde4 [Reynold Xin] [SPARK-7927] whitespace fixes for GraphX.
* [SPARK-5854] personalized page rankDan McClary2015-05-012-6/+112
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Here's a modification to PageRank which does personalized PageRank. The approach is basically similar to that outlined by Bahmani et al. from 2010 (http://arxiv.org/pdf/1006.2880.pdf). I'm sure this needs tuning up or other considerations, so let me know how I can improve this. Author: Dan McClary <dan.mcclary@gmail.com> Author: dwmclary <dan.mcclary@gmail.com> Closes #4774 from dwmclary/SPARK-5854-Personalized-PageRank and squashes the following commits: 8b907db [dwmclary] fixed scalastyle errors in PageRankSuite 2c20e5d [dwmclary] merged with upstream master d6cebac [dwmclary] updated as per style requests 7d00c23 [Dan McClary] fixed line overrun in personalizedVertexPageRank d711677 [Dan McClary] updated vertexProgram to restore binary compatibility for inner method bb8d507 [Dan McClary] Merge branch 'master' of https://github.com/apache/spark into SPARK-5854-Personalized-PageRank fba0edd [Dan McClary] fixed silly mistakes de51be2 [Dan McClary] cleaned up whitespace between comments and methods 0c30d0c [Dan McClary] updated to maintain binary compatibility aaf0b4b [Dan McClary] Merge branch 'master' of https://github.com/apache/spark into SPARK-5854-Personalized-PageRank 76773f6 [Dan McClary] Merge branch 'master' of https://github.com/apache/spark into SPARK-5854-Personalized-PageRank 44ada8e [Dan McClary] updated tolerance on chain PPR 1ffed95 [Dan McClary] updated tolerance on chain PPR b67ac69 [Dan McClary] updated tolerance on chain PPR a560942 [Dan McClary] rolled PPR into pregel code for PageRank 6dc2c29 [Dan McClary] initial implementation of personalized page rank
* SPARK-6710 GraphX Fixed Wrong initial bias in GraphX SVDPlusPlusMichael Malak2015-04-111-1/+1
| | | | | | | | Author: Michael Malak <michaelmalak@yahoo.com> Closes #5464 from michaelmalak/master and squashes the following commits: 9d942ba [Michael Malak] SPARK-6710 GraphX Fixed Wrong initial bias in GraphX SVDPlusPlus
* [SPARK-6736][GraphX][Doc]Example of Graph#aggregateMessages has errorSasaki Toru2015-04-071-1/+1
| | | | | | | | | | | Example of Graph#aggregateMessages has error. Since aggregateMessages is a method of Graph, It should be written "rawGraph.aggregateMessages" Author: Sasaki Toru <sasakitoa@nttdata.co.jp> Closes #5388 from sasakitoa/aggregateMessagesExample and squashes the following commits: b1d631b [Sasaki Toru] Example of Graph#aggregateMessages has error
* [SPARK-6428] Turn on explicit type checking for public methods.Reynold Xin2015-04-0311-29/+28
| | | | | | | | | | | | | | This builds on my earlier pull requests and turns on the explicit type checking in scalastyle. Author: Reynold Xin <rxin@databricks.com> Closes #5342 from rxin/SPARK-6428 and squashes the following commits: 7b531ab [Reynold Xin] import ordering 2d9a8a5 [Reynold Xin] jl e668b1c [Reynold Xin] override 9b9e119 [Reynold Xin] Parenthesis. 82e0cf5 [Reynold Xin] [SPARK-6428] Turn on explicit type checking for public methods.
* [SPARK-6510][GraphX]: Add Graph#minus method to act as Set#differenceBrennon York2015-03-263-0/+56
| | | | | | | | | | | | | | | | | | | | | Adds a `Graph#minus` method which will return only unique `VertexId`'s from the calling `VertexRDD`. To demonstrate a basic example with pseudocode: ``` Set((0L,0),(1L,1)).minus(Set((1L,1),(2L,2))) > Set((0L,0)) ``` Author: Brennon York <brennon.york@capitalone.com> Closes #5175 from brennonyork/SPARK-6510 and squashes the following commits: 248d5c8 [Brennon York] added minus(VertexRDD[VD]) method to avoid createUsingIndex and updated the mask operations to simplify with andNot call 3fb7cce [Brennon York] updated graphx doc to reflect the addition of minus method 6575d92 [Brennon York] updated mima exclude aaa030b [Brennon York] completed graph#minus functionality 7227c0f [Brennon York] beginning work on minus functionality
* [HOTFIX] Build break due to https://github.com/apache/spark/pull/5128Reynold Xin2015-03-221-2/+2
|
* [SPARK-6455] [docs] Correct some mistakes and typosHangchen Yu2015-03-223-6/+6
| | | | | | | | | | | | Correct some typos. Correct a mistake in lib/PageRank.scala. The first PageRank implementation uses standalone Graph interface, but the second uses Pregel interface. It may mislead the code viewers. Author: Hangchen Yu <yuhc@gitcafe.com> Closes #5128 from yuhc/master and squashes the following commits: 53e5432 [Hangchen Yu] Merge branch 'master' of https://github.com/yuhc/spark 67b77b5 [Hangchen Yu] [SPARK-6455] [docs] Correct some mistakes and typos 206f2dc [Hangchen Yu] Correct some mistakes and typos.
* [SPARK-6357][GraphX] Add unapply in EdgeContextTakeshi YAMAMURO2015-03-161-0/+17
| | | | | | | | | | This extractor is mainly used for Graph#aggregateMessages*. Author: Takeshi YAMAMURO <linguin.m.s@gmail.com> Closes #5047 from maropu/AddUnapplyInEdgeContext and squashes the following commits: 87e04df [Takeshi YAMAMURO] Add unapply in EdgeContext
* [SPARK-5922][GraphX]: Add diff(other: RDD[VertexId, VD]) in VertexRDDBrennon York2015-03-162-0/+13
| | | | | | | | | | | | | | | | | | | Changed method invocation of 'diff' to match that of 'innerJoin' and 'leftJoin' from VertexRDD[VD] to RDD[(VertexId, VD)]. This change maintains backwards compatibility and better unifies the VertexRDD methods to match each other. Author: Brennon York <brennon.york@capitalone.com> Closes #4733 from brennonyork/SPARK-5922 and squashes the following commits: e800f08 [Brennon York] fixed merge conflicts b9274af [Brennon York] fixed merge conflicts f86375c [Brennon York] fixed minor include line 398ddb4 [Brennon York] fixed merge conflicts aac1810 [Brennon York] updated to aggregateUsingIndex and added test to ensure that method works properly 2af0b88 [Brennon York] removed deprecation line 753c963 [Brennon York] fixed merge conflicts and set preference to use the diff(other: VertexRDD[VD]) method 2c678c6 [Brennon York] added mima exclude to exclude new public diff method from VertexRDD 93186f3 [Brennon York] added back the original diff method to sustain binary compatibility f18356e [Brennon York] changed method invocation of 'diff' to match that of 'innerJoin' and 'leftJoin' from VertexRDD[VD] to RDD[(VertexId, VD)]
* [SPARK-4600][GraphX]: org.apache.spark.graphx.VertexRDD.diff does not workBrennon York2015-03-131-2/+5
| | | | | | | | | | | | | | | | | | | Turns out, per the [convo on the JIRA](https://issues.apache.org/jira/browse/SPARK-4600), `diff` is acting exactly as should. It became a large misconception as I thought it meant set difference, when in fact it does not. To that extent I merely updated the `diff` documentation to, hopefully, better reflect its true intentions moving forward. Author: Brennon York <brennon.york@capitalone.com> Closes #5015 from brennonyork/SPARK-4600 and squashes the following commits: 1e1d1e5 [Brennon York] reverted internal diff docs 92288f7 [Brennon York] reverted both the test suite and the diff function back to its origin functionality f428623 [Brennon York] updated diff documentation to better represent its function cc16d65 [Brennon York] Merge remote-tracking branch 'upstream/master' into SPARK-4600 66818b9 [Brennon York] added small secondary diff test 99ad412 [Brennon York] Merge remote-tracking branch 'upstream/master' into SPARK-4600 74b8c95 [Brennon York] corrected method by leveraging bitmask operations to correctly return only the portions of that are different from the calling VertexRDD 9717120 [Brennon York] updated diff impl to cause fewer objects to be created 710a21c [Brennon York] working diff given test case aa57f83 [Brennon York] updated to set ShortestPaths to run 'forward' rather than 'backward'
* [SPARK-5814][MLLIB][GRAPHX] Remove JBLAS from runtimeXiangrui Meng2015-03-121-37/+59
| | | | | | | | | | | | | | | | | The issue is discussed in https://issues.apache.org/jira/browse/SPARK-5669. Replacing all JBLAS usage by netlib-java gives us a simpler dependency tree and less license issues to worry about. I didn't touch the test scope in this PR. The user guide is not modified to avoid merge conflicts with branch-1.3. srowen ankurdave pwendell Author: Xiangrui Meng <meng@databricks.com> Closes #4699 from mengxr/SPARK-5814 and squashes the following commits: 48635c6 [Xiangrui Meng] move netlib-java version to parent pom ca21c74 [Xiangrui Meng] remove jblas from ml-guide 5f7767a [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-5814 c5c4183 [Xiangrui Meng] merge master 0f20cad [Xiangrui Meng] add mima excludes e53e9f4 [Xiangrui Meng] remove jblas from mllib runtime ceaa14d [Xiangrui Meng] replace jblas by netlib-java in graphx fa7c2ca [Xiangrui Meng] move jblas to test scope
* [SPARK-6103][Graphx]remove unused class to import in EdgeRDDImplLianhui Wang2015-03-021-1/+1
| | | | | | | | | | Class TaskContext is unused in EdgeRDDImpl, so we need to remove it from import list. Author: Lianhui Wang <lianhuiwang09@gmail.com> Closes #4846 from lianhuiwang/SPARK-6103 and squashes the following commits: 31aed64 [Lianhui Wang] remove unused class to import in EdgeRDDImpl
* [SPARK-1955][GraphX]: VertexRDD can incorrectly assume index sharingBrennon York2015-02-251-3/+9
| | | | | | | | | | Fixes the issue whereby when VertexRDD's are `diff`ed, `innerJoin`ed, or `leftJoin`ed and have different partition sizes they fail under the `zipPartitions` method. This fix tests whether the partitions are equal or not and, if not, will repartition the other to match the partition size of the calling VertexRDD. Author: Brennon York <brennon.york@capitalone.com> Closes #4705 from brennonyork/SPARK-1955 and squashes the following commits: 0882590 [Brennon York] updated to properly handle differently-partitioned vertexRDDs
* SPARK-5815 [MLLIB] Part 2. Deprecate SVDPlusPlus APIs that expose ↵Sean Owen2015-02-161-27/+15
| | | | | | | | | | | | DoubleMatrix from JBLAS Now, deprecated runSVDPlusPlus and update run, for 1.4.0 / master only Author: Sean Owen <sowen@cloudera.com> Closes #4625 from srowen/SPARK-5815.2 and squashes the following commits: 6fd2ca5 [Sean Owen] Now, deprecated runSVDPlusPlus and update run, for 1.4.0 / master only
* SPARK-5815 [MLLIB] Deprecate SVDPlusPlus APIs that expose DoubleMatrix from ↵Sean Owen2015-02-151-0/+25
| | | | | | | | | | | | | | | JBLAS Deprecate SVDPlusPlus.run and introduce SVDPlusPlus.runSVDPlusPlus with return type that doesn't include DoubleMatrix CC mengxr Author: Sean Owen <sowen@cloudera.com> Closes #4614 from srowen/SPARK-5815 and squashes the following commits: 288cb05 [Sean Owen] Clarify deprecation plans in scaladoc 497458e [Sean Owen] Deprecate SVDPlusPlus.run and introduce SVDPlusPlus.runSVDPlusPlus with return type that doesn't include DoubleMatrix
* SPARK-3290 [GRAPHX] No unpersist callls in SVDPlusPlusSean Owen2015-02-131-8/+32
| | | | | | | | | | This just unpersist()s each RDD in this code that was cache()ed. Author: Sean Owen <sowen@cloudera.com> Closes #4234 from srowen/SPARK-3290 and squashes the following commits: 66c1e11 [Sean Owen] unpersist() each RDD that was cache()ed