diff options
author | Brennon York <brennon.york@capitalone.com> | 2015-03-13 18:48:31 +0000 |
---|---|---|
committer | Sean Owen <sowen@cloudera.com> | 2015-03-13 18:48:31 +0000 |
commit | b943f5d907df0607ecffb729f2bccfa436438d7e (patch) | |
tree | 8f420c83bd960b8ee0befb66fc71efd698122b25 /graphx/src/main | |
parent | 7f13434a5c52b815c584ec773ab0e5df1a35ea86 (diff) | |
download | spark-b943f5d907df0607ecffb729f2bccfa436438d7e.tar.gz spark-b943f5d907df0607ecffb729f2bccfa436438d7e.tar.bz2 spark-b943f5d907df0607ecffb729f2bccfa436438d7e.zip |
[SPARK-4600][GraphX]: org.apache.spark.graphx.VertexRDD.diff does not work
Turns out, per the [convo on the JIRA](https://issues.apache.org/jira/browse/SPARK-4600), `diff` is acting exactly as should. It became a large misconception as I thought it meant set difference, when in fact it does not. To that extent I merely updated the `diff` documentation to, hopefully, better reflect its true intentions moving forward.
Author: Brennon York <brennon.york@capitalone.com>
Closes #5015 from brennonyork/SPARK-4600 and squashes the following commits:
1e1d1e5 [Brennon York] reverted internal diff docs
92288f7 [Brennon York] reverted both the test suite and the diff function back to its origin functionality
f428623 [Brennon York] updated diff documentation to better represent its function
cc16d65 [Brennon York] Merge remote-tracking branch 'upstream/master' into SPARK-4600
66818b9 [Brennon York] added small secondary diff test
99ad412 [Brennon York] Merge remote-tracking branch 'upstream/master' into SPARK-4600
74b8c95 [Brennon York] corrected method by leveraging bitmask operations to correctly return only the portions of that are different from the calling VertexRDD
9717120 [Brennon York] updated diff impl to cause fewer objects to be created
710a21c [Brennon York] working diff given test case
aa57f83 [Brennon York] updated to set ShortestPaths to run 'forward' rather than 'backward'
Diffstat (limited to 'graphx/src/main')
-rw-r--r-- | graphx/src/main/scala/org/apache/spark/graphx/VertexRDD.scala | 7 |
1 files changed, 5 insertions, 2 deletions
diff --git a/graphx/src/main/scala/org/apache/spark/graphx/VertexRDD.scala b/graphx/src/main/scala/org/apache/spark/graphx/VertexRDD.scala index 09ae3f9f6c..40ecff7107 100644 --- a/graphx/src/main/scala/org/apache/spark/graphx/VertexRDD.scala +++ b/graphx/src/main/scala/org/apache/spark/graphx/VertexRDD.scala @@ -122,8 +122,11 @@ abstract class VertexRDD[VD]( def mapValues[VD2: ClassTag](f: (VertexId, VD) => VD2): VertexRDD[VD2] /** - * Hides vertices that are the same between `this` and `other`; for vertices that are different, - * keeps the values from `other`. + * For each vertex present in both `this` and `other`, `diff` returns only those vertices with + * differing values; for values that are different, keeps the values from `other`. This is + * only guaranteed to work if the VertexRDDs share a common ancestor. + * + * @param other the other VertexRDD with which to diff against. */ def diff(other: VertexRDD[VD]): VertexRDD[VD] |