[SPARK-3190] Avoid overflow in VertexRDD.count()

VertexRDDs with more than 4 billion elements are counted incorrectly due to integer overflow when summing partition sizes. This PR fixes the issue by converting partition sizes to Longs before summing them. The following code previously returned -10000000. After applying this PR, it returns the correct answer of 5000000000 (5 billion). ```scala val pairs = sc.parallelize(0L until 500L).map(_ * 10000000) .flatMap(start => start until (start + 10000000)).map(x => (x, x)) VertexRDD(pairs).count() ``` Author: Ankur Dave <ankurdave@gmail.com> Closes #2106 from ankurdave/SPARK-3190 and squashes the following commits: 641f468 [Ankur Dave] Avoid overflow in VertexRDD.count()
author: Ankur Dave <ankurdave@gmail.com> 2014-08-28 15:17:01 -0700
committer: Josh Rosen <joshrosen@apache.org> 2014-08-28 15:17:01 -0700
commit: 96df92906978c5f58e0cc8ff5eebe5b35a08be3b (patch)
tree: 983908ea591adc8147530f3d1431eeaa21a5db9a /graphx/src/main/scala/org/apache
parent: 39012452daa0746fe5d218493b85f9b5f96190c1 (diff)
download: spark-96df92906978c5f58e0cc8ff5eebe5b35a08be3b.tar.gz
spark-96df92906978c5f58e0cc8ff5eebe5b35a08be3b.tar.bz2
spark-96df92906978c5f58e0cc8ff5eebe5b35a08be3b.zip
1 files changed, 1 insertions, 1 deletions
diff --git a/graphx/src/main/scala/org/apache/spark/graphx/VertexRDD.scala b/graphx/src/main/scala/org/apache/spark/graphx/VertexRDD.scala
index 4825d12fc2..04fbc9dbab 100644
--- a/graphx/src/main/scala/org/apache/spark/graphx/VertexRDD.scala
+++ b/graphx/src/main/scala/org/apache/spark/graphx/VertexRDD.scala
@@ -108,7 +108,7 @@ class VertexRDD[@specialized VD: ClassTag](
 
   /** The number of vertices in the RDD. */
   override def count(): Long = {
-    partitionsRDD.map(_.size).reduce(_ + _)
+    partitionsRDD.map(_.size.toLong).reduce(_ + _)
   }
 
   /**
author	Ankur Dave <ankurdave@gmail.com>	2014-08-28 15:17:01 -0700
committer	Josh Rosen <joshrosen@apache.org>	2014-08-28 15:17:01 -0700
commit	96df92906978c5f58e0cc8ff5eebe5b35a08be3b (patch)
tree	983908ea591adc8147530f3d1431eeaa21a5db9a /graphx/src/main/scala/org/apache
parent	39012452daa0746fe5d218493b85f9b5f96190c1 (diff)
download	spark-96df92906978c5f58e0cc8ff5eebe5b35a08be3b.tar.gz spark-96df92906978c5f58e0cc8ff5eebe5b35a08be3b.tar.bz2 spark-96df92906978c5f58e0cc8ff5eebe5b35a08be3b.zip