aboutsummaryrefslogtreecommitdiff
path: root/bin/load-spark-env.cmd
diff options
context:
space:
mode:
authorLiang-Chi Hsieh <simonh@tw.ibm.com>2016-03-28 09:58:47 -0700
committerDavies Liu <davies.liu@gmail.com>2016-03-28 09:58:47 -0700
commit68c0c460bfc51d7f69d09b613c49c212dd0b375c (patch)
tree3c1ee9e3d56e7c8e41c54b70a6bf8472c0078d54 /bin/load-spark-env.cmd
parentc8388297c436691a236520d2396deaf556aedb0e (diff)
downloadspark-68c0c460bfc51d7f69d09b613c49c212dd0b375c.tar.gz
spark-68c0c460bfc51d7f69d09b613c49c212dd0b375c.tar.bz2
spark-68c0c460bfc51d7f69d09b613c49c212dd0b375c.zip
[SPARK-13742] [CORE] Add non-iterator interface to RandomSampler
JIRA: https://issues.apache.org/jira/browse/SPARK-13742 ## What changes were proposed in this pull request? `RandomSampler.sample` currently accepts iterator as input and output another iterator. This makes it inappropriate to use in wholestage codegen of `Sampler` operator #11517. This change is to add non-iterator interface to `RandomSampler`. This change adds a new method `def sample(): Int` to the trait `RandomSampler`. As we don't need to know the actual values of the sampling items, so this new method takes no arguments. This method will decide whether to sample the next item or not. It returns how many times the next item will be sampled. For `BernoulliSampler` and `BernoulliCellSampler`, the returned sampling times can only be 0 or 1. It simply means whether to sample the next item or not. For `PoissonSampler`, the returned value can be more than 1, meaning the next item will be sampled multiple times. ## How was this patch tested? Tests are added into `RandomSamplerSuite`. Author: Liang-Chi Hsieh <simonh@tw.ibm.com> Author: Liang-Chi Hsieh <viirya@appier.com> Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #11578 from viirya/random-sampler-no-iterator.
Diffstat (limited to 'bin/load-spark-env.cmd')
0 files changed, 0 insertions, 0 deletions