aboutsummaryrefslogtreecommitdiff
path: root/mllib
diff options
context:
space:
mode:
authorJosh Rosen <joshrosen@databricks.com>2016-06-05 16:51:00 -0700
committerReynold Xin <rxin@databricks.com>2016-06-05 16:51:00 -0700
commit26c1089c37149061f838129bb53330ded68ff4c9 (patch)
tree3f83d6015f5bf704ea274cccd84f5d25b9f53c9d /mllib
parent30c4774f33fed63b7d400d220d710fb432f599a8 (diff)
downloadspark-26c1089c37149061f838129bb53330ded68ff4c9.tar.gz
spark-26c1089c37149061f838129bb53330ded68ff4c9.tar.bz2
spark-26c1089c37149061f838129bb53330ded68ff4c9.zip
[SPARK-15748][SQL] Replace inefficient foldLeft() call with flatMap() in PartitionStatistics
`PartitionStatistics` uses `foldLeft` and list concatenation (`++`) to flatten an iterator of lists, but this is extremely inefficient compared to simply doing `flatMap`/`flatten` because it performs many unnecessary object allocations. Simply replacing this `foldLeft` by a `flatMap` results in decent performance gains when constructing PartitionStatistics instances for tables with many columns. This patch fixes this and also makes two similar changes in MLlib and streaming to try to fix all known occurrences of this pattern. Author: Josh Rosen <joshrosen@databricks.com> Closes #13491 from JoshRosen/foldleft-to-flatmap.
Diffstat (limited to 'mllib')
-rw-r--r--mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala2
1 files changed, 1 insertions, 1 deletions
diff --git a/mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala b/mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala
index 94d1b83ec2..8ed40c379c 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala
@@ -422,7 +422,7 @@ private[ml] object MetaAlgorithmReadWrite {
case rformModel: RFormulaModel => Array(rformModel.pipelineModel)
case _: Params => Array()
}
- val subStageMaps = subStages.map(getUidMapImpl).foldLeft(List.empty[(String, Params)])(_ ++ _)
+ val subStageMaps = subStages.flatMap(getUidMapImpl)
List((instance.uid, instance)) ++ subStageMaps
}
}