[SPARK-19118][SQL] Percentile support for frequency distribution table - spark

diff options

author	gagan taneja <tanejagagan@gagans-MacBook-Pro.local>	2017-02-07 14:05:22 +0100
committer	Herman van Hovell <hvanhovell@databricks.com>	2017-02-07 14:05:22 +0100
commit	e99e34d0f370211a7c7b96d144cc932b2fc71d10 (patch)
tree	06cd312cf7437f0b221937664ea34c983a0faf3b /R/pkg/inst
parent	3d314d08c9420e74b4bb687603cdd11394eccab5 (diff)
download	spark-e99e34d0f370211a7c7b96d144cc932b2fc71d10.tar.gz spark-e99e34d0f370211a7c7b96d144cc932b2fc71d10.tar.bz2 spark-e99e34d0f370211a7c7b96d144cc932b2fc71d10.zip

[SPARK-19118][SQL] Percentile support for frequency distribution table

## What changes were proposed in this pull request? I have a frequency distribution table with following entries Age, No of person 21, 10 22, 15 23, 18 .. .. 30, 14 Moreover it is common to have data in frequency distribution format to further calculate Percentile, Median. With current implementation It would be very difficult and complex to find the percentile. Therefore i am proposing enhancement to current Percentile and Approx Percentile implementation to take frequency distribution column into consideration ## How was this patch tested? 1) Enhanced /sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/aggregate/PercentileSuite.scala to cover the additional functionality 2) Run some performance benchmark test with 20 million row in local environment and did not see any performance degradation Please review http://spark.apache.org/contributing.html before opening a pull request. Author: gagan taneja <tanejagagan@gagans-MacBook-Pro.local> Closes #16497 from tanejagagan/branch-18940.

Diffstat (limited to 'R/pkg/inst')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: