aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorJosh Rosen <joshrosen@databricks.com>2016-04-27 17:34:55 -0700
committerReynold Xin <rxin@databricks.com>2016-04-27 17:34:55 -0700
commit8c49cebce572330fc84362662a9e3e8f7625bf5d (patch)
tree9630ea3f1504f7be9655663ca23c6a54f12b160f
parentf5ebb18c45ffdee2756a80f64239cb9158df1a11 (diff)
downloadspark-8c49cebce572330fc84362662a9e3e8f7625bf5d.tar.gz
spark-8c49cebce572330fc84362662a9e3e8f7625bf5d.tar.bz2
spark-8c49cebce572330fc84362662a9e3e8f7625bf5d.zip
[SPARK-14966] SizeEstimator should ignore classes in the scala.reflect package
In local profiling, I noticed SizeEstimator spending tons of time estimating the size of objects which contain TypeTag or ClassTag fields. The problem with these tags is that they reference global Scala reflection objects, which, in turn, reference many singletons, such as TestHive. This throws off the accuracy of the size estimation and wastes tons of time traversing a huge object graph. As a result, I think that SizeEstimator should ignore any classes in the `scala.reflect` package. Author: Josh Rosen <joshrosen@databricks.com> Closes #12741 from JoshRosen/ignore-scala-reflect-in-size-estimator.
-rw-r--r--core/src/main/scala/org/apache/spark/util/SizeEstimator.scala3
1 files changed, 3 insertions, 0 deletions
diff --git a/core/src/main/scala/org/apache/spark/util/SizeEstimator.scala b/core/src/main/scala/org/apache/spark/util/SizeEstimator.scala
index 6861a75612..386fdfd218 100644
--- a/core/src/main/scala/org/apache/spark/util/SizeEstimator.scala
+++ b/core/src/main/scala/org/apache/spark/util/SizeEstimator.scala
@@ -207,6 +207,9 @@ object SizeEstimator extends Logging {
val cls = obj.getClass
if (cls.isArray) {
visitArray(obj, cls, state)
+ } else if (cls.getName.startsWith("scala.reflect")) {
+ // Many objects in the scala.reflect package reference global reflection objects which, in
+ // turn, reference many other large global objects. Do nothing in this case.
} else if (obj.isInstanceOf[ClassLoader] || obj.isInstanceOf[Class[_]]) {
// Hadoop JobConfs created in the interpreter have a ClassLoader, which greatly confuses
// the size estimator since it references the whole REPL. Do nothing in this case. In