aboutsummaryrefslogtreecommitdiff
path: root/sql/hive/compatibility/src/test/scala
diff options
context:
space:
mode:
authorTejas Patil <tejasp@fb.com>2016-10-04 18:59:31 -0700
committerHerman van Hovell <hvanhovell@databricks.com>2016-10-04 18:59:31 -0700
commita99743d053e84f695dc3034550939555297b0a05 (patch)
tree566a00324e1d3fdabc416e31efd3c25a3e6cf2cb /sql/hive/compatibility/src/test/scala
parent8d969a2125d915da1506c17833aa98da614a257f (diff)
downloadspark-a99743d053e84f695dc3034550939555297b0a05.tar.gz
spark-a99743d053e84f695dc3034550939555297b0a05.tar.bz2
spark-a99743d053e84f695dc3034550939555297b0a05.zip
[SPARK-17495][SQL] Add Hash capability semantically equivalent to Hive's
## What changes were proposed in this pull request? Jira : https://issues.apache.org/jira/browse/SPARK-17495 Spark internally uses Murmur3Hash for partitioning. This is different from the one used by Hive. For queries which use bucketing this leads to different results if one tries the same query on both engines. For us, we want users to have backward compatibility to that one can switch parts of applications across the engines without observing regressions. This PR includes `HiveHash`, `HiveHashFunction`, `HiveHasher` which mimics Hive's hashing at https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java#L638 I am intentionally not introducing any usages of this hash function in rest of the code to keep this PR small. My eventual goal is to have Hive bucketing support in Spark. Once this PR gets in, I will make hash function pluggable in relevant areas (eg. `HashPartitioning`'s `partitionIdExpression` has Murmur3 hardcoded : https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala#L265) ## How was this patch tested? Added `HiveHashSuite` Author: Tejas Patil <tejasp@fb.com> Closes #15047 from tejasapatil/SPARK-17495_hive_hash.
Diffstat (limited to 'sql/hive/compatibility/src/test/scala')
0 files changed, 0 insertions, 0 deletions