diff options
author | Xusen Yin <yinxusen@gmail.com> | 2016-05-01 12:29:01 -0700 |
---|---|---|
committer | Joseph K. Bradley <joseph@databricks.com> | 2016-05-01 12:29:01 -0700 |
commit | a6428292f78fd594f41a4a7bf254d40268f46305 (patch) | |
tree | 4abbc07b299f0b05e563e21bcfdcc42afdfc4b2b /python/pyspark/ml/wrapper.py | |
parent | cdf9e9753df4e7f2fa4e972d1bfded4e22943c27 (diff) | |
download | spark-a6428292f78fd594f41a4a7bf254d40268f46305.tar.gz spark-a6428292f78fd594f41a4a7bf254d40268f46305.tar.bz2 spark-a6428292f78fd594f41a4a7bf254d40268f46305.zip |
[SPARK-14931][ML][PYTHON] Mismatched default values between pipelines in Spark and PySpark - update
## What changes were proposed in this pull request?
This PR is an update for [https://github.com/apache/spark/pull/12738] which:
* Adds a generic unit test for JavaParams wrappers in pyspark.ml for checking default Param values vs. the defaults in the Scala side
* Various fixes for bugs found
* This includes changing classes taking weightCol to treat unset and empty String Param values the same way.
Defaults changed:
* Scala
* LogisticRegression: weightCol defaults to not set (instead of empty string)
* StringIndexer: labels default to not set (instead of empty array)
* GeneralizedLinearRegression:
* maxIter always defaults to 25 (simpler than defaulting to 25 for a particular solver)
* weightCol defaults to not set (instead of empty string)
* LinearRegression: weightCol defaults to not set (instead of empty string)
* Python
* MultilayerPerceptron: layers default to not set (instead of [1,1])
* ChiSqSelector: numTopFeatures defaults to 50 (instead of not set)
## How was this patch tested?
Generic unit test. Manually tested that unit test by changing defaults and verifying that broke the test.
Author: Joseph K. Bradley <joseph@databricks.com>
Author: yinxusen <yinxusen@gmail.com>
Closes #12816 from jkbradley/yinxusen-SPARK-14931.
Diffstat (limited to 'python/pyspark/ml/wrapper.py')
-rw-r--r-- | python/pyspark/ml/wrapper.py | 3 |
1 files changed, 2 insertions, 1 deletions
diff --git a/python/pyspark/ml/wrapper.py b/python/pyspark/ml/wrapper.py index fef626c7fa..fef0040faf 100644 --- a/python/pyspark/ml/wrapper.py +++ b/python/pyspark/ml/wrapper.py @@ -110,7 +110,8 @@ class JavaParams(JavaWrapper, Params): for param in self.params: if self._java_obj.hasParam(param.name): java_param = self._java_obj.getParam(param.name) - if self._java_obj.isDefined(java_param): + # SPARK-14931: Only check set params back to avoid default params mismatch. + if self._java_obj.isSet(java_param): value = _java2py(sc, self._java_obj.getOrDefault(java_param)) self._set(**{param.name: value}) |