aboutsummaryrefslogtreecommitdiff
path: root/assembly/pom.xml
diff options
context:
space:
mode:
authorJim Carroll <jim@dontcallme.com>2014-11-14 15:11:53 -0800
committerMichael Armbrust <michael@databricks.com>2014-11-14 15:11:53 -0800
commitf76b9683706232c3d4e8e6e61627b8188dcb79dc (patch)
treeef205b638b3bfae61aa666b55533ad0564f72b12 /assembly/pom.xml
parent0c7b66bd449093bb5d2dafaf91d54e63e601e320 (diff)
downloadspark-f76b9683706232c3d4e8e6e61627b8188dcb79dc.tar.gz
spark-f76b9683706232c3d4e8e6e61627b8188dcb79dc.tar.bz2
spark-f76b9683706232c3d4e8e6e61627b8188dcb79dc.zip
[SPARK-4386] Improve performance when writing Parquet files.
If you profile the writing of a Parquet file, the single worst time consuming call inside of org.apache.spark.sql.parquet.MutableRowWriteSupport.write is actually in the scala.collection.AbstractSequence.size call. This is because the size call actually ends up COUNTING the elements in a scala.collection.LinearSeqOptimized.length ("optimized?"). This doesn't need to be done. "size" is called repeatedly where needed rather than called once at the top of the method and stored in a 'val'. Author: Jim Carroll <jim@dontcallme.com> Closes #3254 from jimfcarroll/parquet-perf and squashes the following commits: 30cc0b5 [Jim Carroll] Improve performance when writing Parquet files.
Diffstat (limited to 'assembly/pom.xml')
0 files changed, 0 insertions, 0 deletions