aboutsummaryrefslogtreecommitdiff
path: root/conf
diff options
context:
space:
mode:
authorgagan taneja <tanejagagan@gagans-MacBook-Pro.local>2017-02-07 14:05:22 +0100
committerHerman van Hovell <hvanhovell@databricks.com>2017-02-07 14:05:22 +0100
commite99e34d0f370211a7c7b96d144cc932b2fc71d10 (patch)
tree06cd312cf7437f0b221937664ea34c983a0faf3b /conf
parent3d314d08c9420e74b4bb687603cdd11394eccab5 (diff)
downloadspark-e99e34d0f370211a7c7b96d144cc932b2fc71d10.tar.gz
spark-e99e34d0f370211a7c7b96d144cc932b2fc71d10.tar.bz2
spark-e99e34d0f370211a7c7b96d144cc932b2fc71d10.zip
[SPARK-19118][SQL] Percentile support for frequency distribution table
## What changes were proposed in this pull request? I have a frequency distribution table with following entries Age, No of person 21, 10 22, 15 23, 18 .. .. 30, 14 Moreover it is common to have data in frequency distribution format to further calculate Percentile, Median. With current implementation It would be very difficult and complex to find the percentile. Therefore i am proposing enhancement to current Percentile and Approx Percentile implementation to take frequency distribution column into consideration ## How was this patch tested? 1) Enhanced /sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/aggregate/PercentileSuite.scala to cover the additional functionality 2) Run some performance benchmark test with 20 million row in local environment and did not see any performance degradation Please review http://spark.apache.org/contributing.html before opening a pull request. Author: gagan taneja <tanejagagan@gagans-MacBook-Pro.local> Closes #16497 from tanejagagan/branch-18940.
Diffstat (limited to 'conf')
0 files changed, 0 insertions, 0 deletions