diff options
author | Bryan Cutler <bjcutler@us.ibm.com> | 2015-11-23 17:11:51 -0800 |
---|---|---|
committer | Joseph K. Bradley <joseph@databricks.com> | 2015-11-23 17:11:51 -0800 |
commit | 105745645b12afbbc2a350518cb5853a88944183 (patch) | |
tree | 258d010b9eb9702a4e809ef84680e4adbfa390fd /python/pyspark/mllib/regression.py | |
parent | 9db5f601facfdaba6e4333a6b2d2e4a9f009c788 (diff) | |
download | spark-105745645b12afbbc2a350518cb5853a88944183.tar.gz spark-105745645b12afbbc2a350518cb5853a88944183.tar.bz2 spark-105745645b12afbbc2a350518cb5853a88944183.zip |
[SPARK-10560][PYSPARK][MLLIB][DOCS] Make StreamingLogisticRegressionWithSGD Python API equal to Scala one
This is to bring the API documentation of StreamingLogisticReressionWithSGD and StreamingLinearRegressionWithSGC in line with the Scala versions.
-Fixed the algorithm descriptions
-Added default values to parameter descriptions
-Changed StreamingLogisticRegressionWithSGD regParam to default to 0, as in the Scala version
Author: Bryan Cutler <bjcutler@us.ibm.com>
Closes #9141 from BryanCutler/StreamingLogisticRegressionWithSGD-python-api-sync.
Diffstat (limited to 'python/pyspark/mllib/regression.py')
-rw-r--r-- | python/pyspark/mllib/regression.py | 32 |
1 files changed, 21 insertions, 11 deletions
diff --git a/python/pyspark/mllib/regression.py b/python/pyspark/mllib/regression.py index 6f00d1df20..13b3397501 100644 --- a/python/pyspark/mllib/regression.py +++ b/python/pyspark/mllib/regression.py @@ -734,17 +734,27 @@ class StreamingLinearAlgorithm(object): @inherit_doc class StreamingLinearRegressionWithSGD(StreamingLinearAlgorithm): """ - Run LinearRegression with SGD on a batch of data. - - The problem minimized is (1 / n_samples) * (y - weights'X)**2. - After training on a batch of data, the weights obtained at the end of - training are used as initial weights for the next batch. - - :param stepSize: Step size for each iteration of gradient descent. - :param numIterations: Total number of iterations run. - :param miniBatchFraction: Fraction of data on which SGD is run for each - iteration. - :param convergenceTol: A condition which decides iteration termination. + Train or predict a linear regression model on streaming data. Training uses + Stochastic Gradient Descent to update the model based on each new batch of + incoming data from a DStream (see `LinearRegressionWithSGD` for model equation). + + Each batch of data is assumed to be an RDD of LabeledPoints. + The number of data points per batch can vary, but the number + of features must be constant. An initial weight + vector must be provided. + + :param stepSize: + Step size for each iteration of gradient descent. + (default: 0.1) + :param numIterations: + Number of iterations run for each batch of data. + (default: 50) + :param miniBatchFraction: + Fraction of each batch of data to use for updates. + (default: 1.0) + :param convergenceTol: + Value used to determine when to terminate iterations. + (default: 0.001) .. versionadded:: 1.5.0 """ |