[SPARK-3964] [MLlib] [PySpark] add Hypothesis test Python API - spark

diff options

author	Davies Liu <davies@databricks.com>	2014-11-04 21:35:52 -0800
committer	Xiangrui Meng <meng@databricks.com>	2014-11-04 21:36:05 -0800
commit	f225b3cc18698b2ee8a94c8ffa0b6aca2fce7cf9 (patch)
tree	0daca2a2f01192eeca8e834d697b0193c1394ae9 /python/docs/make.bat
parent	e5c7869f20139832ad9e636eaeb5e77da7297456 (diff)
download	spark-f225b3cc18698b2ee8a94c8ffa0b6aca2fce7cf9.tar.gz spark-f225b3cc18698b2ee8a94c8ffa0b6aca2fce7cf9.tar.bz2 spark-f225b3cc18698b2ee8a94c8ffa0b6aca2fce7cf9.zip

[SPARK-3964] [MLlib] [PySpark] add Hypothesis test Python API

``` pyspark.mllib.stat.StatisticschiSqTest(observed, expected=None) :: Experimental :: If `observed` is Vector, conduct Pearson's chi-squared goodness of fit test of the observed data against the expected distribution, or againt the uniform distribution (by default), with each category having an expected frequency of `1 / len(observed)`. (Note: `observed` cannot contain negative values) If `observed` is matrix, conduct Pearson's independence test on the input contingency matrix, which cannot contain negative entries or columns or rows that sum up to 0. If `observed` is an RDD of LabeledPoint, conduct Pearson's independence test for every feature against the label across the input RDD. For each feature, the (feature, label) pairs are converted into a contingency matrix for which the chi-squared statistic is computed. All label and feature values must be categorical. :param observed: it could be a vector containing the observed categorical counts/relative frequencies, or the contingency matrix (containing either counts or relative frequencies), or an RDD of LabeledPoint containing the labeled dataset with categorical features. Real-valued features will be treated as categorical for each distinct value. :param expected: Vector containing the expected categorical counts/relative frequencies. `expected` is rescaled if the `expected` sum differs from the `observed` sum. :return: ChiSquaredTest object containing the test statistic, degrees of freedom, p-value, the method used, and the null hypothesis. ``` Author: Davies Liu <davies@databricks.com> Closes #3091 from davies/his and squashes the following commits: 145d16c [Davies Liu] address comments 0ab0764 [Davies Liu] fix float 5097d54 [Davies Liu] add Hypothesis test Python API (cherry picked from commit c8abddc5164d8cf11cdede6ab3d5d1ea08028708) Signed-off-by: Xiangrui Meng <meng@databricks.com>

Diffstat (limited to 'python/docs/make.bat')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: