diff options
author | Daniel Darabos <darabos.daniel@gmail.com> | 2014-04-02 12:27:37 -0700 |
---|---|---|
committer | Reynold Xin <rxin@apache.org> | 2014-04-02 12:27:37 -0700 |
commit | 78236334e4ca7518b6d7d9b38464dbbda854a777 (patch) | |
tree | 77c0f27b6cdc04b5b25a3ad7761e068206264ab5 /graphx/src/test/scala/org | |
parent | de8eefa804e229635eaa29a78b9e9ce161ac58e1 (diff) | |
download | spark-78236334e4ca7518b6d7d9b38464dbbda854a777.tar.gz spark-78236334e4ca7518b6d7d9b38464dbbda854a777.tar.bz2 spark-78236334e4ca7518b6d7d9b38464dbbda854a777.zip |
Do not re-use objects in the EdgePartition/EdgeTriplet iterators.
This avoids a silent data corruption issue (https://spark-project.atlassian.net/browse/SPARK-1188) and has no performance impact by my measurements. It also simplifies the code. As far as I can tell the object re-use was nothing but premature optimization.
I did actual benchmarks for all the included changes, and there is no performance difference. I am not sure where to put the benchmarks. Does Spark not have a benchmark suite?
This is an example benchmark I did:
test("benchmark") {
val builder = new EdgePartitionBuilder[Int]
for (i <- (1 to 10000000)) {
builder.add(i.toLong, i.toLong, i)
}
val p = builder.toEdgePartition
p.map(_.attr + 1).iterator.toList
}
It ran for 10 seconds both before and after this change.
Author: Daniel Darabos <darabos.daniel@gmail.com>
Closes #276 from darabos/spark-1188 and squashes the following commits:
574302b [Daniel Darabos] Restore "manual" copying in EdgePartition.map(Iterator). Add comment to discourage novices like myself from trying to simplify the code.
4117a64 [Daniel Darabos] Revert EdgePartitionSuite.
4955697 [Daniel Darabos] Create a copy of the Edge objects in EdgeRDD.compute(). This avoids exposing the object re-use, while still enables the more efficient behavior for internal code.
4ec77f8 [Daniel Darabos] Add comments about object re-use to the affected functions.
2da5e87 [Daniel Darabos] Restore object re-use in EdgePartition.
0182f2b [Daniel Darabos] Do not re-use objects in the EdgePartition/EdgeTriplet iterators. This avoids a silent data corruption issue (SPARK-1188) and has no performance impact in my measurements. It also simplifies the code.
c55f52f [Daniel Darabos] Tests that reproduce the problems from SPARK-1188.
Diffstat (limited to 'graphx/src/test/scala/org')
-rw-r--r-- | graphx/src/test/scala/org/apache/spark/graphx/impl/EdgeTripletIteratorSuite.scala | 43 |
1 files changed, 43 insertions, 0 deletions
diff --git a/graphx/src/test/scala/org/apache/spark/graphx/impl/EdgeTripletIteratorSuite.scala b/graphx/src/test/scala/org/apache/spark/graphx/impl/EdgeTripletIteratorSuite.scala new file mode 100644 index 0000000000..9cbb2d2acd --- /dev/null +++ b/graphx/src/test/scala/org/apache/spark/graphx/impl/EdgeTripletIteratorSuite.scala @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.graphx.impl + +import scala.reflect.ClassTag +import scala.util.Random + +import org.scalatest.FunSuite + +import org.apache.spark.graphx._ + +class EdgeTripletIteratorSuite extends FunSuite { + test("iterator.toList") { + val builder = new EdgePartitionBuilder[Int] + builder.add(1, 2, 0) + builder.add(1, 3, 0) + builder.add(1, 4, 0) + val vidmap = new VertexIdToIndexMap + vidmap.add(1) + vidmap.add(2) + vidmap.add(3) + vidmap.add(4) + val vs = Array.fill(vidmap.capacity)(0) + val iter = new EdgeTripletIterator[Int, Int](vidmap, vs, builder.toEdgePartition) + val result = iter.toList.map(et => (et.srcId, et.dstId)) + assert(result === Seq((1, 2), (1, 3), (1, 4))) + } +} |