diff options
author | actuaryzhang <actuaryzhang10@gmail.com> | 2017-04-07 12:29:45 -0700 |
---|---|---|
committer | Felix Cheung <felixcheung@apache.org> | 2017-04-07 12:29:45 -0700 |
commit | 1ad73f0a21d8007d8466ef8756f751c0ab6a9d1f (patch) | |
tree | b367a05a51cb42e4c1e02a1be1f54986e26d7922 /examples/src/main/r/ml/glm.R | |
parent | 8feb799af0bb67618310947342e3e4d2a77aae13 (diff) | |
download | spark-1ad73f0a21d8007d8466ef8756f751c0ab6a9d1f.tar.gz spark-1ad73f0a21d8007d8466ef8756f751c0ab6a9d1f.tar.bz2 spark-1ad73f0a21d8007d8466ef8756f751c0ab6a9d1f.zip |
[SPARK-20258][DOC][SPARKR] Fix SparkR logistic regression example in programming guide (did not converge)
## What changes were proposed in this pull request?
SparkR logistic regression example did not converge in programming guide (for IRWLS). All estimates are essentially zero:
```
training2 <- read.df("data/mllib/sample_binary_classification_data.txt", source = "libsvm")
df_list2 <- randomSplit(training2, c(7,3), 2)
binomialDF <- df_list2[[1]]
binomialTestDF <- df_list2[[2]]
binomialGLM <- spark.glm(binomialDF, label ~ features, family = "binomial")
17/04/07 11:42:03 WARN WeightedLeastSquares: Cholesky solver failed due to singular covariance matrix. Retrying with Quasi-Newton solver.
> summary(binomialGLM)
Coefficients:
Estimate
(Intercept) 9.0255e+00
features_0 0.0000e+00
features_1 0.0000e+00
features_2 0.0000e+00
features_3 0.0000e+00
features_4 0.0000e+00
features_5 0.0000e+00
features_6 0.0000e+00
features_7 0.0000e+00
```
Author: actuaryzhang <actuaryzhang10@gmail.com>
Closes #17571 from actuaryzhang/programGuide2.
Diffstat (limited to 'examples/src/main/r/ml/glm.R')
-rw-r--r-- | examples/src/main/r/ml/glm.R | 7 |
1 files changed, 4 insertions, 3 deletions
diff --git a/examples/src/main/r/ml/glm.R b/examples/src/main/r/ml/glm.R index 23141b57df..68787f9aa9 100644 --- a/examples/src/main/r/ml/glm.R +++ b/examples/src/main/r/ml/glm.R @@ -27,7 +27,7 @@ sparkR.session(appName = "SparkR-ML-glm-example") # $example on$ training <- read.df("data/mllib/sample_multiclass_classification_data.txt", source = "libsvm") # Fit a generalized linear model of family "gaussian" with spark.glm -df_list <- randomSplit(training, c(7,3), 2) +df_list <- randomSplit(training, c(7, 3), 2) gaussianDF <- df_list[[1]] gaussianTestDF <- df_list[[2]] gaussianGLM <- spark.glm(gaussianDF, label ~ features, family = "gaussian") @@ -44,8 +44,9 @@ gaussianGLM2 <- glm(label ~ features, gaussianDF, family = "gaussian") summary(gaussianGLM2) # Fit a generalized linear model of family "binomial" with spark.glm -training2 <- read.df("data/mllib/sample_binary_classification_data.txt", source = "libsvm") -df_list2 <- randomSplit(training2, c(7,3), 2) +training2 <- read.df("data/mllib/sample_multiclass_classification_data.txt", source = "libsvm") +training2 <- transform(training2, label = cast(training2$label > 1, "integer")) +df_list2 <- randomSplit(training2, c(7, 3), 2) binomialDF <- df_list2[[1]] binomialTestDF <- df_list2[[2]] binomialGLM <- spark.glm(binomialDF, label ~ features, family = "binomial") |