aboutsummaryrefslogtreecommitdiff
path: root/docs/mllib-statistics.md
diff options
context:
space:
mode:
authorFeynman Liang <feynman.liang@gmail.com>2015-11-30 15:38:44 -0800
committerXiangrui Meng <meng@databricks.com>2015-11-30 15:38:44 -0800
commit55358889309cf2d856b72e72e0f3081dfdf61cfa (patch)
treec52e01ffa7276e514bfb622f704afa6b3be264d3 /docs/mllib-statistics.md
parentde64b65f7cf2ac58c1abc310ba547637fdbb8557 (diff)
downloadspark-55358889309cf2d856b72e72e0f3081dfdf61cfa.tar.gz
spark-55358889309cf2d856b72e72e0f3081dfdf61cfa.tar.bz2
spark-55358889309cf2d856b72e72e0f3081dfdf61cfa.zip
[SPARK-11960][MLLIB][DOC] User guide for streaming tests
CC jkbradley mengxr josepablocam Author: Feynman Liang <feynman.liang@gmail.com> Closes #10005 from feynmanliang/streaming-test-user-guide.
Diffstat (limited to 'docs/mllib-statistics.md')
-rw-r--r--docs/mllib-statistics.md25
1 files changed, 25 insertions, 0 deletions
diff --git a/docs/mllib-statistics.md b/docs/mllib-statistics.md
index ade5b0768a..de209f68e1 100644
--- a/docs/mllib-statistics.md
+++ b/docs/mllib-statistics.md
@@ -521,6 +521,31 @@ print(testResult) # summary of the test including the p-value, test statistic,
</div>
</div>
+### Streaming Significance Testing
+MLlib provides online implementations of some tests to support use cases
+like A/B testing. These tests may be performed on a Spark Streaming
+`DStream[(Boolean,Double)]` where the first element of each tuple
+indicates control group (`false`) or treatment group (`true`) and the
+second element is the value of an observation.
+
+Streaming significance testing supports the following parameters:
+
+* `peacePeriod` - The number of initial data points from the stream to
+ignore, used to mitigate novelty effects.
+* `windowSize` - The number of past batches to perform hypothesis
+testing over. Setting to `0` will perform cumulative processing using
+all prior batches.
+
+
+<div class="codetabs">
+<div data-lang="scala" markdown="1">
+[`StreamingTest`](api/scala/index.html#org.apache.spark.mllib.stat.test.StreamingTest)
+provides streaming hypothesis testing.
+
+{% include_example scala/org/apache/spark/examples/mllib/StreamingTestExample.scala %}
+</div>
+</div>
+
## Random data generation