aboutsummaryrefslogtreecommitdiff
path: root/docs/mllib-feature-extraction.md
diff options
context:
space:
mode:
authorYuhao Yang <hhbyyh@gmail.com>2016-05-17 20:44:19 +0200
committerNick Pentreath <nickp@za.ibm.com>2016-05-17 20:44:19 +0200
commit3308a862ba0983268c9d5acf9e2a7d2b62d3ec27 (patch)
tree408b6dcc04bd77e835e445aedb062818a4ce98d8 /docs/mllib-feature-extraction.md
parent8d05a7a98bdbd3ce7c81d273e05a375877ebe68f (diff)
downloadspark-3308a862ba0983268c9d5acf9e2a7d2b62d3ec27.tar.gz
spark-3308a862ba0983268c9d5acf9e2a7d2b62d3ec27.tar.bz2
spark-3308a862ba0983268c9d5acf9e2a7d2b62d3ec27.zip
[SPARK-15182][ML] Copy MLlib doc to ML: ml.feature.tf, idf
## What changes were proposed in this pull request? We should now begin copying algorithm details from the spark.mllib guide to spark.ml as needed, rather than just linking back to the corresponding algorithms in the spark.mllib user guide. ## How was this patch tested? manual review for doc. Author: Yuhao Yang <hhbyyh@gmail.com> Author: Yuhao Yang <yuhao.yang@intel.com> Closes #12957 from hhbyyh/tfidfdoc.
Diffstat (limited to 'docs/mllib-feature-extraction.md')
-rw-r--r--docs/mllib-feature-extraction.md3
1 files changed, 3 insertions, 0 deletions
diff --git a/docs/mllib-feature-extraction.md b/docs/mllib-feature-extraction.md
index 7a97285032..4c027c84ec 100644
--- a/docs/mllib-feature-extraction.md
+++ b/docs/mllib-feature-extraction.md
@@ -10,6 +10,9 @@ displayTitle: Feature Extraction and Transformation - spark.mllib
## TF-IDF
+**Note** We recommend using the DataFrame-based API, which is detailed in the [ML user guide on
+TF-IDF](ml-features.html#tf-idf).
+
[Term frequency-inverse document frequency (TF-IDF)](http://en.wikipedia.org/wiki/Tf%E2%80%93idf) is a feature
vectorization method widely used in text mining to reflect the importance of a term to a document in the corpus.
Denote a term by `$t$`, a document by `$d$`, and the corpus by `$D$`.