diff options
Diffstat (limited to 'docs/mllib-statistics.md')
-rw-r--r-- | docs/mllib-statistics.md | 25 |
1 files changed, 25 insertions, 0 deletions
diff --git a/docs/mllib-statistics.md b/docs/mllib-statistics.md index ade5b0768a..de209f68e1 100644 --- a/docs/mllib-statistics.md +++ b/docs/mllib-statistics.md @@ -521,6 +521,31 @@ print(testResult) # summary of the test including the p-value, test statistic, </div> </div> +### Streaming Significance Testing +MLlib provides online implementations of some tests to support use cases +like A/B testing. These tests may be performed on a Spark Streaming +`DStream[(Boolean,Double)]` where the first element of each tuple +indicates control group (`false`) or treatment group (`true`) and the +second element is the value of an observation. + +Streaming significance testing supports the following parameters: + +* `peacePeriod` - The number of initial data points from the stream to +ignore, used to mitigate novelty effects. +* `windowSize` - The number of past batches to perform hypothesis +testing over. Setting to `0` will perform cumulative processing using +all prior batches. + + +<div class="codetabs"> +<div data-lang="scala" markdown="1"> +[`StreamingTest`](api/scala/index.html#org.apache.spark.mllib.stat.test.StreamingTest) +provides streaming hypothesis testing. + +{% include_example scala/org/apache/spark/examples/mllib/StreamingTestExample.scala %} +</div> +</div> + ## Random data generation |