aboutsummaryrefslogtreecommitdiff
path: root/docs/ml-classification-regression.md
diff options
context:
space:
mode:
authorYanbo Liang <ybliang8@gmail.com>2016-06-07 15:25:36 -0700
committerYanbo Liang <ybliang8@gmail.com>2016-06-07 15:25:36 -0700
commit6ecedf39b44c9acd58cdddf1a31cf11e8e24428c (patch)
tree480604299bd07f81c1166d80214b8a1433ff95fd /docs/ml-classification-regression.md
parent890baaca5078df0b50c0054f55a2c33023f7fd67 (diff)
downloadspark-6ecedf39b44c9acd58cdddf1a31cf11e8e24428c.tar.gz
spark-6ecedf39b44c9acd58cdddf1a31cf11e8e24428c.tar.bz2
spark-6ecedf39b44c9acd58cdddf1a31cf11e8e24428c.zip
[SPARK-13590][ML][DOC] Document spark.ml LiR, LoR and AFTSurvivalRegression behavior difference
## What changes were proposed in this pull request? When fitting ```LinearRegressionModel```(by "l-bfgs" solver) and ```LogisticRegressionModel``` w/o intercept on dataset with constant nonzero column, spark.ml produce same model as R glmnet but different from LIBSVM. When fitting ```AFTSurvivalRegressionModel``` w/o intercept on dataset with constant nonzero column, spark.ml produce different model compared with R survival::survreg. We should output a warning message and clarify in document for this condition. ## How was this patch tested? Document change, no unit test. cc mengxr Author: Yanbo Liang <ybliang8@gmail.com> Closes #12731 from yanboliang/spark-13590.
Diffstat (limited to 'docs/ml-classification-regression.md')
-rw-r--r--docs/ml-classification-regression.md6
1 files changed, 6 insertions, 0 deletions
diff --git a/docs/ml-classification-regression.md b/docs/ml-classification-regression.md
index ff8dec6d2d..88457d4bb1 100644
--- a/docs/ml-classification-regression.md
+++ b/docs/ml-classification-regression.md
@@ -62,6 +62,8 @@ For more background and more details about the implementation, refer to the docu
> The current implementation of logistic regression in `spark.ml` only supports binary classes. Support for multiclass regression will be added in the future.
+ > When fitting LogisticRegressionModel without intercept on dataset with constant nonzero column, Spark MLlib outputs zero coefficients for constant nonzero columns. This behavior is the same as R glmnet but different from LIBSVM.
+
**Example**
The following example shows how to train a logistic regression model
@@ -351,6 +353,8 @@ Refer to the [Python API docs](api/python/pyspark.ml.html#pyspark.ml.classificat
The interface for working with linear regression models and model
summaries is similar to the logistic regression case.
+ > When fitting LinearRegressionModel without intercept on dataset with constant nonzero column by "l-bfgs" solver, Spark MLlib outputs zero coefficients for constant nonzero columns. This behavior is the same as R glmnet but different from LIBSVM.
+
**Example**
The following
@@ -666,6 +670,8 @@ The optimization algorithm underlying the implementation is L-BFGS.
The implementation matches the result from R's survival function
[survreg](https://stat.ethz.ch/R-manual/R-devel/library/survival/html/survreg.html)
+ > When fitting AFTSurvivalRegressionModel without intercept on dataset with constant nonzero column, Spark MLlib outputs zero coefficients for constant nonzero columns. This behavior is different from R survival::survreg.
+
**Example**
<div class="codetabs">