aboutsummaryrefslogtreecommitdiff
path: root/docs/ml-features.md
diff options
context:
space:
mode:
authorYuhao Yang <hhbyyh@gmail.com>2015-09-08 22:33:23 -0700
committerXiangrui Meng <meng@databricks.com>2015-09-08 22:33:23 -0700
commit91a577d2778ab5946f0c40cb80c89de24e3d10e8 (patch)
tree07cc7944e7ad8c6995161660e4ebc7226645cf2a /docs/ml-features.md
parent2f6fd5256c6650868916a3eefaa0beb091187cbb (diff)
downloadspark-91a577d2778ab5946f0c40cb80c89de24e3d10e8.tar.gz
spark-91a577d2778ab5946f0c40cb80c89de24e3d10e8.tar.bz2
spark-91a577d2778ab5946f0c40cb80c89de24e3d10e8.zip
[SPARK-10249] [ML] [DOC] Add Python Code Example to StopWordsRemover User Guide
jira: https://issues.apache.org/jira/browse/SPARK-10249 update user guide since python support added. Author: Yuhao Yang <hhbyyh@gmail.com> Closes #8620 from hhbyyh/swPyDocExample.
Diffstat (limited to 'docs/ml-features.md')
-rw-r--r--docs/ml-features.md19
1 files changed, 19 insertions, 0 deletions
diff --git a/docs/ml-features.md b/docs/ml-features.md
index 90654d1e5a..58b31a5a5c 100644
--- a/docs/ml-features.md
+++ b/docs/ml-features.md
@@ -512,6 +512,25 @@ DataFrame dataset = jsql.createDataFrame(rdd, schema);
remover.transform(dataset).show();
{% endhighlight %}
</div>
+
+<div data-lang="python" markdown="1">
+[`StopWordsRemover`](api/python/pyspark.ml.html#pyspark.ml.feature.StopWordsRemover)
+takes an input column name, an output column name, a list of stop words,
+and a boolean indicating if the matches should be case sensitive (false
+by default).
+
+{% highlight python %}
+from pyspark.ml.feature import StopWordsRemover
+
+sentenceData = sqlContext.createDataFrame([
+ (0, ["I", "saw", "the", "red", "baloon"]),
+ (1, ["Mary", "had", "a", "little", "lamb"])
+], ["label", "raw"])
+
+remover = StopWordsRemover(inputCol="raw", outputCol="filtered")
+remover.transform(sentenceData).show(truncate=False)
+{% endhighlight %}
+</div>
</div>
## $n$-gram