[SPARK-14931][ML][PYTHON] Mismatched default values between pipelines in Spark and PySpark - update

## What changes were proposed in this pull request? This PR is an update for [https://github.com/apache/spark/pull/12738] which: * Adds a generic unit test for JavaParams wrappers in pyspark.ml for checking default Param values vs. the defaults in the Scala side * Various fixes for bugs found * This includes changing classes taking weightCol to treat unset and empty String Param values the same way. Defaults changed: * Scala * LogisticRegression: weightCol defaults to not set (instead of empty string) * StringIndexer: labels default to not set (instead of empty array) * GeneralizedLinearRegression: * maxIter always defaults to 25 (simpler than defaulting to 25 for a particular solver) * weightCol defaults to not set (instead of empty string) * LinearRegression: weightCol defaults to not set (instead of empty string) * Python * MultilayerPerceptron: layers default to not set (instead of [1,1]) * ChiSqSelector: numTopFeatures defaults to 50 (instead of not set) ## How was this patch tested? Generic unit test. Manually tested that unit test by changing defaults and verifying that broke the test. Author: Joseph K. Bradley <joseph@databricks.com> Author: yinxusen <yinxusen@gmail.com> Closes #12816 from jkbradley/yinxusen-SPARK-14931.
author: Xusen Yin <yinxusen@gmail.com> 2016-05-01 12:29:01 -0700
committer: Joseph K. Bradley <joseph@databricks.com> 2016-05-01 12:29:01 -0700
commit: a6428292f78fd594f41a4a7bf254d40268f46305 (patch)
tree: 4abbc07b299f0b05e563e21bcfdcc42afdfc4b2b /python/pyspark/ml/feature.py
parent: cdf9e9753df4e7f2fa4e972d1bfded4e22943c27 (diff)
download: spark-a6428292f78fd594f41a4a7bf254d40268f46305.tar.gz
spark-a6428292f78fd594f41a4a7bf254d40268f46305.tar.bz2
spark-a6428292f78fd594f41a4a7bf254d40268f46305.zip
1 files changed, 1 insertions, 0 deletions
diff --git a/python/pyspark/ml/feature.py b/python/pyspark/ml/feature.py
index 1b059a7199..b95d288198 100644
--- a/python/pyspark/ml/feature.py
+++ b/python/pyspark/ml/feature.py
@@ -2617,6 +2617,7 @@ class ChiSqSelector(JavaEstimator, HasFeaturesCol, HasOutputCol, HasLabelCol, Ja
         """
         super(ChiSqSelector, self).__init__()
         self._java_obj = self._new_java_obj("org.apache.spark.ml.feature.ChiSqSelector", self.uid)
+        self._setDefault(numTopFeatures=50)
         kwargs = self.__init__._input_kwargs
         self.setParams(**kwargs)
author	Xusen Yin <yinxusen@gmail.com>	2016-05-01 12:29:01 -0700
committer	Joseph K. Bradley <joseph@databricks.com>	2016-05-01 12:29:01 -0700
commit	a6428292f78fd594f41a4a7bf254d40268f46305 (patch)
tree	4abbc07b299f0b05e563e21bcfdcc42afdfc4b2b /python/pyspark/ml/feature.py
parent	cdf9e9753df4e7f2fa4e972d1bfded4e22943c27 (diff)
download	spark-a6428292f78fd594f41a4a7bf254d40268f46305.tar.gz spark-a6428292f78fd594f41a4a7bf254d40268f46305.tar.bz2 spark-a6428292f78fd594f41a4a7bf254d40268f46305.zip