Do not re-use objects in the EdgePartition/EdgeTriplet iterators.

This avoids a silent data corruption issue (https://spark-project.atlassian.net/browse/SPARK-1188) and has no performance impact by my measurements. It also simplifies the code. As far as I can tell the object re-use was nothing but premature optimization. I did actual benchmarks for all the included changes, and there is no performance difference. I am not sure where to put the benchmarks. Does Spark not have a benchmark suite? This is an example benchmark I did: test("benchmark") { val builder = new EdgePartitionBuilder[Int] for (i <- (1 to 10000000)) { builder.add(i.toLong, i.toLong, i) } val p = builder.toEdgePartition p.map(_.attr + 1).iterator.toList } It ran for 10 seconds both before and after this change. Author: Daniel Darabos <darabos.daniel@gmail.com> Closes #276 from darabos/spark-1188 and squashes the following commits: 574302b [Daniel Darabos] Restore "manual" copying in EdgePartition.map(Iterator). Add comment to discourage novices like myself from trying to simplify the code. 4117a64 [Daniel Darabos] Revert EdgePartitionSuite. 4955697 [Daniel Darabos] Create a copy of the Edge objects in EdgeRDD.compute(). This avoids exposing the object re-use, while still enables the more efficient behavior for internal code. 4ec77f8 [Daniel Darabos] Add comments about object re-use to the affected functions. 2da5e87 [Daniel Darabos] Restore object re-use in EdgePartition. 0182f2b [Daniel Darabos] Do not re-use objects in the EdgePartition/EdgeTriplet iterators. This avoids a silent data corruption issue (SPARK-1188) and has no performance impact in my measurements. It also simplifies the code. c55f52f [Daniel Darabos] Tests that reproduce the problems from SPARK-1188.
author: Daniel Darabos <darabos.daniel@gmail.com> 2014-04-02 12:27:37 -0700
committer: Reynold Xin <rxin@apache.org> 2014-04-02 12:27:37 -0700
commit: 78236334e4ca7518b6d7d9b38464dbbda854a777 (patch)
tree: 77c0f27b6cdc04b5b25a3ad7761e068206264ab5 /graphx/src/test
parent: de8eefa804e229635eaa29a78b9e9ce161ac58e1 (diff)
download: spark-78236334e4ca7518b6d7d9b38464dbbda854a777.tar.gz
spark-78236334e4ca7518b6d7d9b38464dbbda854a777.tar.bz2
spark-78236334e4ca7518b6d7d9b38464dbbda854a777.zip
1 files changed, 43 insertions, 0 deletions
diff --git a/graphx/src/test/scala/org/apache/spark/graphx/impl/EdgeTripletIteratorSuite.scala b/graphx/src/test/scala/org/apache/spark/graphx/impl/EdgeTripletIteratorSuite.scala
new file mode 100644
index 0000000000..9cbb2d2acd
--- /dev/null
+++ b/graphx/src/test/scala/org/apache/spark/graphx/impl/EdgeTripletIteratorSuite.scala
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.graphx.impl
+
+import scala.reflect.ClassTag
+import scala.util.Random
+
+import org.scalatest.FunSuite
+
+import org.apache.spark.graphx._
+
+class EdgeTripletIteratorSuite extends FunSuite {
+  test("iterator.toList") {
+    val builder = new EdgePartitionBuilder[Int]
+    builder.add(1, 2, 0)
+    builder.add(1, 3, 0)
+    builder.add(1, 4, 0)
+    val vidmap = new VertexIdToIndexMap
+    vidmap.add(1)
+    vidmap.add(2)
+    vidmap.add(3)
+    vidmap.add(4)
+    val vs = Array.fill(vidmap.capacity)(0)
+    val iter = new EdgeTripletIterator[Int, Int](vidmap, vs, builder.toEdgePartition)
+    val result = iter.toList.map(et => (et.srcId, et.dstId))
+    assert(result === Seq((1, 2), (1, 3), (1, 4)))
+  }
+}
author	Daniel Darabos <darabos.daniel@gmail.com>	2014-04-02 12:27:37 -0700
committer	Reynold Xin <rxin@apache.org>	2014-04-02 12:27:37 -0700
commit	78236334e4ca7518b6d7d9b38464dbbda854a777 (patch)
tree	77c0f27b6cdc04b5b25a3ad7761e068206264ab5 /graphx/src/test
parent	de8eefa804e229635eaa29a78b9e9ce161ac58e1 (diff)
download	spark-78236334e4ca7518b6d7d9b38464dbbda854a777.tar.gz spark-78236334e4ca7518b6d7d9b38464dbbda854a777.tar.bz2 spark-78236334e4ca7518b6d7d9b38464dbbda854a777.zip