diff options
author | Davies Liu <davies.liu@gmail.com> | 2014-08-23 19:33:34 -0700 |
---|---|---|
committer | Josh Rosen <joshrosen@apache.org> | 2014-08-23 19:33:34 -0700 |
commit | 8df4dad4951ca6e687df1288331909878922a55f (patch) | |
tree | 85712756692df718f235514431bd85321f0b7653 /LICENSE | |
parent | db436e36c4e20893de708a0bc07a5a8877c49563 (diff) | |
download | spark-8df4dad4951ca6e687df1288331909878922a55f.tar.gz spark-8df4dad4951ca6e687df1288331909878922a55f.tar.bz2 spark-8df4dad4951ca6e687df1288331909878922a55f.zip |
[SPARK-2871] [PySpark] add approx API for RDD
RDD.countApprox(self, timeout, confidence=0.95)
:: Experimental ::
Approximate version of count() that returns a potentially incomplete
result within a timeout, even if not all tasks have finished.
>>> rdd = sc.parallelize(range(1000), 10)
>>> rdd.countApprox(1000, 1.0)
1000
RDD.sumApprox(self, timeout, confidence=0.95)
Approximate operation to return the sum within a timeout
or meet the confidence.
>>> rdd = sc.parallelize(range(1000), 10)
>>> r = sum(xrange(1000))
>>> (rdd.sumApprox(1000) - r) / r < 0.05
RDD.meanApprox(self, timeout, confidence=0.95)
:: Experimental ::
Approximate operation to return the mean within a timeout
or meet the confidence.
>>> rdd = sc.parallelize(range(1000), 10)
>>> r = sum(xrange(1000)) / 1000.0
>>> (rdd.meanApprox(1000) - r) / r < 0.05
True
Author: Davies Liu <davies.liu@gmail.com>
Closes #2095 from davies/approx and squashes the following commits:
e8c252b [Davies Liu] add approx API for RDD
Diffstat (limited to 'LICENSE')
0 files changed, 0 insertions, 0 deletions