aboutsummaryrefslogtreecommitdiff
path: root/run-example.cmd
diff options
context:
space:
mode:
authorReynold Xin <rxin@apache.org>2013-12-31 17:48:24 -0800
committerReynold Xin <rxin@apache.org>2013-12-31 17:48:24 -0800
commit8b8e70ebde880d08ebb3816b2f4003247559c7f8 (patch)
treeaa984e1263c1e825b50c80e6651a35d686bf2c7d /run-example.cmd
parent63b411dd8664c27ac55586d8345733afad80961f (diff)
parentbee445c927586136673f39259f23642a5a6e8efe (diff)
downloadspark-8b8e70ebde880d08ebb3816b2f4003247559c7f8.tar.gz
spark-8b8e70ebde880d08ebb3816b2f4003247559c7f8.tar.bz2
spark-8b8e70ebde880d08ebb3816b2f4003247559c7f8.zip
Merge pull request #73 from falaki/ApproximateDistinctCount
Approximate distinct count Added countApproxDistinct() to RDD and countApproxDistinctByKey() to PairRDDFunctions to approximately count distinct number of elements and distinct number of values per key, respectively. Both functions use HyperLogLog from stream-lib for counting. Both functions take a parameter that controls the trade-off between accuracy and memory consumption. Also added Scala docs and test suites for both methods.
Diffstat (limited to 'run-example.cmd')
0 files changed, 0 insertions, 0 deletions