aboutsummaryrefslogtreecommitdiff
path: root/docs/ml-features.md
diff options
context:
space:
mode:
authorBryan Cutler <cutlerb@gmail.com>2016-05-07 11:20:38 +0200
committerNick Pentreath <nickp@za.ibm.com>2016-05-07 11:20:38 +0200
commit5d188a6970ef97d11656ab39255109fefc42203d (patch)
tree04bf61f390905d03cdc5b1dd7fbf906b80449b7f /docs/ml-features.md
parentb0cafdb6ccff9add89dc31c45adf87c8fa906aac (diff)
downloadspark-5d188a6970ef97d11656ab39255109fefc42203d.tar.gz
spark-5d188a6970ef97d11656ab39255109fefc42203d.tar.bz2
spark-5d188a6970ef97d11656ab39255109fefc42203d.zip
[DOC][MINOR] Fixed minor errors in feature.ml user guide doc
## What changes were proposed in this pull request? Fixed some minor errors found when reviewing feature.ml user guide ## How was this patch tested? built docs locally Author: Bryan Cutler <cutlerb@gmail.com> Closes #12940 from BryanCutler/feature.ml-doc_fixes-DOCS-MINOR.
Diffstat (limited to 'docs/ml-features.md')
-rw-r--r--docs/ml-features.md8
1 files changed, 5 insertions, 3 deletions
diff --git a/docs/ml-features.md b/docs/ml-features.md
index 237e93ae90..c79bcac461 100644
--- a/docs/ml-features.md
+++ b/docs/ml-features.md
@@ -127,7 +127,7 @@ Assume that we have the following DataFrame with columns `id` and `texts`:
1 | Array("a", "b", "b", "c", "a")
~~~~
-each row in`texts` is a document of type Array[String].
+each row in `texts` is a document of type Array[String].
Invoking fit of `CountVectorizer` produces a `CountVectorizerModel` with vocabulary (a, b, c),
then the output column "vector" after transformation contains:
@@ -185,7 +185,7 @@ for more details on the API.
<div data-lang="scala" markdown="1">
Refer to the [Tokenizer Scala docs](api/scala/index.html#org.apache.spark.ml.feature.Tokenizer)
-and the [RegexTokenizer Scala docs](api/scala/index.html#org.apache.spark.ml.feature.Tokenizer)
+and the [RegexTokenizer Scala docs](api/scala/index.html#org.apache.spark.ml.feature.RegexTokenizer)
for more details on the API.
{% include_example scala/org/apache/spark/examples/ml/TokenizerExample.scala %}
@@ -775,7 +775,7 @@ The rescaled value for a feature E is calculated as,
\end{equation}`
For the case `E_{max} == E_{min}`, `Rescaled(e_i) = 0.5 * (max + min)`
-Note that since zero values will probably be transformed to non-zero values, output of the transformer will be DenseVector even for sparse input.
+Note that since zero values will probably be transformed to non-zero values, output of the transformer will be `DenseVector` even for sparse input.
The following example demonstrates how to load a dataset in libsvm format and then rescale each feature to [0, 1].
@@ -801,6 +801,7 @@ for more details on the API.
<div data-lang="python" markdown="1">
Refer to the [MinMaxScaler Python docs](api/python/pyspark.ml.html#pyspark.ml.feature.MinMaxScaler)
+and the [MinMaxScalerModel Python docs](api/python/pyspark.ml.html#pyspark.ml.feature.MinMaxScalerModel)
for more details on the API.
{% include_example python/ml/min_max_scaler_example.py %}
@@ -841,6 +842,7 @@ for more details on the API.
<div data-lang="python" markdown="1">
Refer to the [MaxAbsScaler Python docs](api/python/pyspark.ml.html#pyspark.ml.feature.MaxAbsScaler)
+and the [MaxAbsScalerModel Python docs](api/python/pyspark.ml.html#pyspark.ml.feature.MaxAbsScalerModel)
for more details on the API.
{% include_example python/ml/max_abs_scaler_example.py %}