aboutsummaryrefslogtreecommitdiff
path: root/project
diff options
context:
space:
mode:
authorMichael Armbrust <michael@databricks.com>2014-04-07 00:14:00 -0700
committerReynold Xin <rxin@apache.org>2014-04-07 00:14:00 -0700
commitaccd0999f9cb6a449434d3fc5274dd469eeecab2 (patch)
tree6c1a73644c3af4aee9a60872bc58d0a1a2bc531a /project
parent87d0928a3301835705652c24a26096546597e156 (diff)
downloadspark-accd0999f9cb6a449434d3fc5274dd469eeecab2.tar.gz
spark-accd0999f9cb6a449434d3fc5274dd469eeecab2.tar.bz2
spark-accd0999f9cb6a449434d3fc5274dd469eeecab2.zip
[SQL] SPARK-1371 Hash Aggregation Improvements
Given: ```scala case class Data(a: Int, b: Int) val rdd = sparkContext .parallelize(1 to 200) .flatMap(_ => (1 to 50000).map(i => Data(i % 100, i))) rdd.registerAsTable("data") cacheTable("data") ``` Before: ``` SELECT COUNT(*) FROM data:[10000000] 16795.567ms SELECT a, SUM(b) FROM data GROUP BY a 7536.436ms SELECT SUM(b) FROM data 10954.1ms ``` After: ``` SELECT COUNT(*) FROM data:[10000000] 1372.175ms SELECT a, SUM(b) FROM data GROUP BY a 2070.446ms SELECT SUM(b) FROM data 958.969ms ``` Author: Michael Armbrust <michael@databricks.com> Closes #295 from marmbrus/hashAgg and squashes the following commits: ec63575 [Michael Armbrust] Add comment. d0495a9 [Michael Armbrust] Use scaladoc instead. b4a6887 [Michael Armbrust] Address review comments. a2d90ba [Michael Armbrust] Capture child output statically to avoid issues with generators and serialization. 7c13112 [Michael Armbrust] Rewrite Aggregate operator to stream input and use projections. Remove unused local RDD functions implicits. 5096f99 [Michael Armbrust] Make HiveUDAF fields transient since object inspectors are not serializable. 6a4b671 [Michael Armbrust] Add option to avoid binding operators expressions automatically. 92cca08 [Michael Armbrust] Always include serialization debug info when running tests. 1279df2 [Michael Armbrust] Increase default number of partitions.
Diffstat (limited to 'project')
-rw-r--r--project/SparkBuild.scala1
1 files changed, 1 insertions, 0 deletions
diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala
index d1e4b8b964..6b8740d9f2 100644
--- a/project/SparkBuild.scala
+++ b/project/SparkBuild.scala
@@ -178,6 +178,7 @@ object SparkBuild extends Build {
fork := true,
javaOptions in Test += "-Dspark.home=" + sparkHome,
javaOptions in Test += "-Dspark.testing=1",
+ javaOptions in Test += "-Dsun.io.serialization.extendedDebugInfo=true",
javaOptions in Test ++= System.getProperties.filter(_._1 startsWith "spark").map { case (k,v) => s"-D$k=$v" }.toSeq,
javaOptions += "-Xmx3g",
// Show full stack trace and duration in test cases.