aboutsummaryrefslogtreecommitdiff
path: root/pom.xml
diff options
context:
space:
mode:
authorReynold Xin <rxin@apache.org>2013-12-31 17:48:24 -0800
committerReynold Xin <rxin@apache.org>2013-12-31 17:48:24 -0800
commit8b8e70ebde880d08ebb3816b2f4003247559c7f8 (patch)
treeaa984e1263c1e825b50c80e6651a35d686bf2c7d /pom.xml
parent63b411dd8664c27ac55586d8345733afad80961f (diff)
parentbee445c927586136673f39259f23642a5a6e8efe (diff)
downloadspark-8b8e70ebde880d08ebb3816b2f4003247559c7f8.tar.gz
spark-8b8e70ebde880d08ebb3816b2f4003247559c7f8.tar.bz2
spark-8b8e70ebde880d08ebb3816b2f4003247559c7f8.zip
Merge pull request #73 from falaki/ApproximateDistinctCount
Approximate distinct count Added countApproxDistinct() to RDD and countApproxDistinctByKey() to PairRDDFunctions to approximately count distinct number of elements and distinct number of values per key, respectively. Both functions use HyperLogLog from stream-lib for counting. Both functions take a parameter that controls the trade-off between accuracy and memory consumption. Also added Scala docs and test suites for both methods.
Diffstat (limited to 'pom.xml')
-rw-r--r--pom.xml5
1 files changed, 5 insertions, 0 deletions
diff --git a/pom.xml b/pom.xml
index 0936ae53b4..6545c82b31 100644
--- a/pom.xml
+++ b/pom.xml
@@ -206,6 +206,11 @@
to explicitly bump the version when building with YARN. It would be nice to figure
out why Maven can't resolve this correctly (like SBT does). -->
<dependency>
+ <groupId>com.clearspring.analytics</groupId>
+ <artifactId>stream</artifactId>
+ <version>2.4.0</version>
+ </dependency>
+ <dependency>
<groupId>com.google.protobuf</groupId>
<artifactId>protobuf-java</artifactId>
<version>${protobuf.version}</version>