diff options
Diffstat (limited to 'docs')
-rw-r--r-- | docs/mllib-feature-extraction.md | 2 | ||||
-rw-r--r-- | docs/mllib-statistics.md | 2 |
2 files changed, 2 insertions, 2 deletions
diff --git a/docs/mllib-feature-extraction.md b/docs/mllib-feature-extraction.md index 1511ae6dda..1162241449 100644 --- a/docs/mllib-feature-extraction.md +++ b/docs/mllib-feature-extraction.md @@ -83,7 +83,7 @@ val idf = new IDF().fit(tf) val tfidf: RDD[Vector] = idf.transform(tf) {% endhighlight %} -MLLib's IDF implementation provides an option for ignoring terms which occur in less than a +MLlib's IDF implementation provides an option for ignoring terms which occur in less than a minimum number of documents. In such cases, the IDF for these terms is set to 0. This feature can be used by passing the `minDocFreq` value to the IDF constructor. diff --git a/docs/mllib-statistics.md b/docs/mllib-statistics.md index c463241399..10a5131c07 100644 --- a/docs/mllib-statistics.md +++ b/docs/mllib-statistics.md @@ -197,7 +197,7 @@ print Statistics.corr(data, method="pearson") ## Stratified sampling -Unlike the other statistics functions, which reside in MLLib, stratified sampling methods, +Unlike the other statistics functions, which reside in MLlib, stratified sampling methods, `sampleByKey` and `sampleByKeyExact`, can be performed on RDD's of key-value pairs. For stratified sampling, the keys can be thought of as a label and the value as a specific attribute. For example the key can be man or woman, or document ids, and the respective values can be the list of ages |