aboutsummaryrefslogtreecommitdiff
path: root/mllib/src/main/scala/org/apache
diff options
context:
space:
mode:
authorPeng, Meng <peng.meng@intel.com>2017-01-10 13:09:58 +0000
committerSean Owen <sowen@cloudera.com>2017-01-10 13:09:58 +0000
commit32286ba68af03af6b9ff50d5dece050e5417307a (patch)
tree85d945c4bc531e91ae05bda2c85559660b6d02c8 /mllib/src/main/scala/org/apache
parentacfc5f354332107cc744fb636e3730f6fc48b2fe (diff)
downloadspark-32286ba68af03af6b9ff50d5dece050e5417307a.tar.gz
spark-32286ba68af03af6b9ff50d5dece050e5417307a.tar.bz2
spark-32286ba68af03af6b9ff50d5dece050e5417307a.zip
[SPARK-17645][MLLIB][ML][FOLLOW-UP] document minor change
## What changes were proposed in this pull request? Add FDR test case in ml/feature/ChiSqSelectorSuite. Improve some comments in the code. This is a follow-up pr for #15212. ## How was this patch tested? ut Author: Peng, Meng <peng.meng@intel.com> Closes #16434 from mpjlu/fdr_fwe_update.
Diffstat (limited to 'mllib/src/main/scala/org/apache')
-rw-r--r--mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala6
-rw-r--r--mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala6
2 files changed, 6 insertions, 6 deletions
diff --git a/mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala b/mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala
index 353bd186da..16abc4949d 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala
@@ -143,13 +143,13 @@ private[feature] trait ChiSqSelectorParams extends Params
* `fdr`, `fwe`.
* - `numTopFeatures` chooses a fixed number of top features according to a chi-squared test.
* - `percentile` is similar but chooses a fraction of all features instead of a fixed number.
- * - `fpr` chooses all features whose p-value is below a threshold, thus controlling the false
+ * - `fpr` chooses all features whose p-value are below a threshold, thus controlling the false
* positive rate of selection.
* - `fdr` uses the [Benjamini-Hochberg procedure]
* (https://en.wikipedia.org/wiki/False_discovery_rate#Benjamini.E2.80.93Hochberg_procedure)
* to choose all features whose false discovery rate is below a threshold.
- * - `fwe` chooses all features whose p-values is below a threshold,
- * thus controlling the family-wise error rate of selection.
+ * - `fwe` chooses all features whose p-values are below a threshold. The threshold is scaled by
+ * 1/numFeatures, thus controlling the family-wise error rate of selection.
* By default, the selection method is `numTopFeatures`, with the default number of top features
* set to 50.
*/
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala b/mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala
index 9dea3c3e84..862be6f37e 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala
@@ -175,13 +175,13 @@ object ChiSqSelectorModel extends Loader[ChiSqSelectorModel] {
* `fdr`, `fwe`.
* - `numTopFeatures` chooses a fixed number of top features according to a chi-squared test.
* - `percentile` is similar but chooses a fraction of all features instead of a fixed number.
- * - `fpr` chooses all features whose p-value is below a threshold, thus controlling the false
+ * - `fpr` chooses all features whose p-values are below a threshold, thus controlling the false
* positive rate of selection.
* - `fdr` uses the [Benjamini-Hochberg procedure]
* (https://en.wikipedia.org/wiki/False_discovery_rate#Benjamini.E2.80.93Hochberg_procedure)
* to choose all features whose false discovery rate is below a threshold.
- * - `fwe` chooses all features whose p-values is below a threshold,
- * thus controlling the family-wise error rate of selection.
+ * - `fwe` chooses all features whose p-values are below a threshold. The threshold is scaled by
+ * 1/numFeatures, thus controlling the family-wise error rate of selection.
* By default, the selection method is `numTopFeatures`, with the default number of top features
* set to 50.
*/