aboutsummaryrefslogtreecommitdiff
path: root/sql
diff options
context:
space:
mode:
authorSean Owen <sowen@cloudera.com>2016-12-07 17:34:45 +0800
committerSean Owen <sowen@cloudera.com>2016-12-07 17:34:45 +0800
commit79f5f281bb69cb2de9f64006180abd753e8ae427 (patch)
treea722ebc403cc4655e96e48b9e3de7502e04271a0 /sql
parentb8280271396eb74638da6546d76bbb2d06c7011b (diff)
downloadspark-79f5f281bb69cb2de9f64006180abd753e8ae427.tar.gz
spark-79f5f281bb69cb2de9f64006180abd753e8ae427.tar.bz2
spark-79f5f281bb69cb2de9f64006180abd753e8ae427.zip
[SPARK-18678][ML] Skewed reservoir sampling in SamplingUtils
## What changes were proposed in this pull request? Fix reservoir sampling bias for small k. An off-by-one error meant that the probability of replacement was slightly too high -- k/(l-1) after l element instead of k/l, which matters for small k. ## How was this patch tested? Existing test plus new test case. Author: Sean Owen <sowen@cloudera.com> Closes #16129 from srowen/SPARK-18678.
Diffstat (limited to 'sql')
0 files changed, 0 insertions, 0 deletions