aboutsummaryrefslogtreecommitdiff
path: root/pom.xml
diff options
context:
space:
mode:
authorDoris Xin <doris.s.xin@gmail.com>2014-07-29 12:49:44 -0700
committerXiangrui Meng <meng@databricks.com>2014-07-29 12:49:44 -0700
commitdc9653641f8806960d79652afa043c3fb84f25d2 (patch)
tree41f6b4bfeb517fe307fd1a6a7e711d6a9cd3a124 /pom.xml
parentf0d880e288eba97c86dceb1b5edab4f3a935943b (diff)
downloadspark-dc9653641f8806960d79652afa043c3fb84f25d2.tar.gz
spark-dc9653641f8806960d79652afa043c3fb84f25d2.tar.bz2
spark-dc9653641f8806960d79652afa043c3fb84f25d2.zip
[SPARK-2082] stratified sampling in PairRDDFunctions that guarantees exact sample size
Implemented stratified sampling that guarantees exact sample size using ScaRSR with two passes over the RDD for sampling without replacement and three passes for sampling with replacement. Author: Doris Xin <doris.s.xin@gmail.com> Author: Xiangrui Meng <meng@databricks.com> Closes #1025 from dorx/stratified and squashes the following commits: 245439e [Doris Xin] moved minSamplingRate to getUpperBound eaf5771 [Doris Xin] bug fixes. 17a381b [Doris Xin] fixed a merge issue and a failed unit ea7d27f [Doris Xin] merge master b223529 [Xiangrui Meng] use approx bounds for poisson fix poisson mean for waitlisting add unit tests for Java b3013a4 [Xiangrui Meng] move math3 back to test scope eecee5f [Doris Xin] Merge branch 'master' into stratified f4c21f3 [Doris Xin] Reviewer comments a10e68d [Doris Xin] style fix a2bf756 [Doris Xin] Merge branch 'master' into stratified 680b677 [Doris Xin] use mapPartitionWithIndex instead 9884a9f [Doris Xin] style fix bbfb8c9 [Doris Xin] Merge branch 'master' into stratified ee9d260 [Doris Xin] addressed reviewer comments 6b5b10b [Doris Xin] Merge branch 'master' into stratified 254e03c [Doris Xin] minor fixes and Java API. 4ad516b [Doris Xin] remove unused imports from PairRDDFunctions bd9dc6e [Doris Xin] unit bug and style violation fixed 1fe1cff [Doris Xin] Changed fractionByKey to a map to enable arg check 944a10c [Doris Xin] [SPARK-2145] Add lower bound on sampling rate 0214a76 [Doris Xin] cleanUp 90d94c0 [Doris Xin] merge master 9e74ab5 [Doris Xin] Separated out most of the logic in sampleByKey 7327611 [Doris Xin] merge master 50581fc [Doris Xin] added a TODO for logging in python 46f6c8c [Doris Xin] fixed the NPE caused by closures being cleaned before being passed into the aggregate function 7e1a481 [Doris Xin] changed the permission on SamplingUtil 1d413ce [Doris Xin] fixed checkstyle issues 9ee94ee [Doris Xin] [SPARK-2082] stratified sampling in PairRDDFunctions that guarantees exact sample size e3fd6a6 [Doris Xin] Merge branch 'master' into takeSample 7cab53a [Doris Xin] fixed import bug in rdd.py ffea61a [Doris Xin] SPARK-1939: Refactor takeSample method in RDD 1441977 [Doris Xin] SPARK-1939 Refactor takeSample method in RDD to use ScaSRS
Diffstat (limited to 'pom.xml')
-rw-r--r--pom.xml6
1 files changed, 6 insertions, 0 deletions
diff --git a/pom.xml b/pom.xml
index 8b1435cfe5..39538f9660 100644
--- a/pom.xml
+++ b/pom.xml
@@ -258,6 +258,12 @@
<version>1.5</version>
</dependency>
<dependency>
+ <groupId>org.apache.commons</groupId>
+ <artifactId>commons-math3</artifactId>
+ <version>3.3</version>
+ <scope>test</scope>
+ </dependency>
+ <dependency>
<groupId>com.google.code.findbugs</groupId>
<artifactId>jsr305</artifactId>
<version>1.3.9</version>