From ed9d80385486cd39a84a689ef467795262af919a Mon Sep 17 00:00:00 2001
From: Yuhao Yang <hhbyyh@gmail.com>
Date: Wed, 20 Apr 2016 11:45:08 +0100
Subject: [SPARK-14635][ML] Documentation and Examples for TF-IDF only refer to
 HashingTF

## What changes were proposed in this pull request?

Currently, the docs for TF-IDF only refer to using HashingTF with IDF. However, CountVectorizer can also be used. We should probably amend the user guide and examples to show this.

## How was this patch tested?

unit tests and doc generation

Author: Yuhao Yang <hhbyyh@gmail.com>

Closes #12454 from hhbyyh/tfdoc.
---
 .../src/main/java/org/apache/spark/examples/ml/JavaTfIdfExample.java    | 2 ++
 1 file changed, 2 insertions(+)

(limited to 'examples/src/main/java')
diff --git a/examples/src/main/java/org/apache/spark/examples/ml/JavaTfIdfExample.java b/examples/src/main/java/org/apache/spark/examples/ml/JavaTfIdfExample.java
index 37a3d0d84d..107c835f2e 100644
--- a/examples/src/main/java/org/apache/spark/examples/ml/JavaTfIdfExample.java
+++ b/examples/src/main/java/org/apache/spark/examples/ml/JavaTfIdfExample.java
@@ -63,6 +63,8 @@ public class JavaTfIdfExample {
       .setOutputCol("rawFeatures")
       .setNumFeatures(numFeatures);
     Dataset<Row> featurizedData = hashingTF.transform(wordsData);
+    // alternatively, CountVectorizer can also be used to get term frequency vectors
+
     IDF idf = new IDF().setInputCol("rawFeatures").setOutputCol("features");
     IDFModel idfModel = idf.fit(featurizedData);
     Dataset<Row> rescaledData = idfModel.transform(featurizedData);
-- 
cgit v1.2.3