aboutsummaryrefslogtreecommitdiff
path: root/sql/hive
diff options
context:
space:
mode:
authorSean Owen <sowen@cloudera.com>2016-10-08 11:31:12 +0100
committerSean Owen <sowen@cloudera.com>2016-10-08 11:31:12 +0100
commit4201ddcc07ca2e9af78bf4a74fdb3900c1783347 (patch)
treeae50667b9ae7e8e8b57ccf431ad08181c40baaac /sql/hive
parent362ba4b6f8e8fc2355368742c5adced7573fec00 (diff)
downloadspark-4201ddcc07ca2e9af78bf4a74fdb3900c1783347.tar.gz
spark-4201ddcc07ca2e9af78bf4a74fdb3900c1783347.tar.bz2
spark-4201ddcc07ca2e9af78bf4a74fdb3900c1783347.zip
[SPARK-17768][CORE] Small (Sum,Count,Mean)Evaluator problems and suboptimalities
## What changes were proposed in this pull request? Fix: - GroupedMeanEvaluator and GroupedSumEvaluator are unused, as is the StudentTCacher support class - CountEvaluator can return a lower bound < 0, when counts can't be negative - MeanEvaluator will actually fail on exactly 1 datum (yields t-test with 0 DOF) - CountEvaluator uses a normal distribution, which may be an inappropriate approximation (leading to above) - Test for SumEvaluator asserts incorrect expected sums – e.g. after observing 10% of data has sum of 2, expectation should be 20, not 38 - CountEvaluator, MeanEvaluator have no unit tests to catch these - Duplication of distribution code across CountEvaluator, GroupedCountEvaluator - The stats in each could use a bit of documentation as I had to guess at them - (Code could use a few cleanups and optimizations too) ## How was this patch tested? Existing and new tests Author: Sean Owen <sowen@cloudera.com> Closes #15341 from srowen/SPARK-17768.
Diffstat (limited to 'sql/hive')
0 files changed, 0 insertions, 0 deletions