diff options
author | Bryan Cutler <cutlerb@gmail.com> | 2016-05-07 11:20:38 +0200 |
---|---|---|
committer | Nick Pentreath <nickp@za.ibm.com> | 2016-05-07 11:20:38 +0200 |
commit | 5d188a6970ef97d11656ab39255109fefc42203d (patch) | |
tree | 04bf61f390905d03cdc5b1dd7fbf906b80449b7f /docs/ml-features.md | |
parent | b0cafdb6ccff9add89dc31c45adf87c8fa906aac (diff) | |
download | spark-5d188a6970ef97d11656ab39255109fefc42203d.tar.gz spark-5d188a6970ef97d11656ab39255109fefc42203d.tar.bz2 spark-5d188a6970ef97d11656ab39255109fefc42203d.zip |
[DOC][MINOR] Fixed minor errors in feature.ml user guide doc
## What changes were proposed in this pull request?
Fixed some minor errors found when reviewing feature.ml user guide
## How was this patch tested?
built docs locally
Author: Bryan Cutler <cutlerb@gmail.com>
Closes #12940 from BryanCutler/feature.ml-doc_fixes-DOCS-MINOR.
Diffstat (limited to 'docs/ml-features.md')
-rw-r--r-- | docs/ml-features.md | 8 |
1 files changed, 5 insertions, 3 deletions
diff --git a/docs/ml-features.md b/docs/ml-features.md index 237e93ae90..c79bcac461 100644 --- a/docs/ml-features.md +++ b/docs/ml-features.md @@ -127,7 +127,7 @@ Assume that we have the following DataFrame with columns `id` and `texts`: 1 | Array("a", "b", "b", "c", "a") ~~~~ -each row in`texts` is a document of type Array[String]. +each row in `texts` is a document of type Array[String]. Invoking fit of `CountVectorizer` produces a `CountVectorizerModel` with vocabulary (a, b, c), then the output column "vector" after transformation contains: @@ -185,7 +185,7 @@ for more details on the API. <div data-lang="scala" markdown="1"> Refer to the [Tokenizer Scala docs](api/scala/index.html#org.apache.spark.ml.feature.Tokenizer) -and the [RegexTokenizer Scala docs](api/scala/index.html#org.apache.spark.ml.feature.Tokenizer) +and the [RegexTokenizer Scala docs](api/scala/index.html#org.apache.spark.ml.feature.RegexTokenizer) for more details on the API. {% include_example scala/org/apache/spark/examples/ml/TokenizerExample.scala %} @@ -775,7 +775,7 @@ The rescaled value for a feature E is calculated as, \end{equation}` For the case `E_{max} == E_{min}`, `Rescaled(e_i) = 0.5 * (max + min)` -Note that since zero values will probably be transformed to non-zero values, output of the transformer will be DenseVector even for sparse input. +Note that since zero values will probably be transformed to non-zero values, output of the transformer will be `DenseVector` even for sparse input. The following example demonstrates how to load a dataset in libsvm format and then rescale each feature to [0, 1]. @@ -801,6 +801,7 @@ for more details on the API. <div data-lang="python" markdown="1"> Refer to the [MinMaxScaler Python docs](api/python/pyspark.ml.html#pyspark.ml.feature.MinMaxScaler) +and the [MinMaxScalerModel Python docs](api/python/pyspark.ml.html#pyspark.ml.feature.MinMaxScalerModel) for more details on the API. {% include_example python/ml/min_max_scaler_example.py %} @@ -841,6 +842,7 @@ for more details on the API. <div data-lang="python" markdown="1"> Refer to the [MaxAbsScaler Python docs](api/python/pyspark.ml.html#pyspark.ml.feature.MaxAbsScaler) +and the [MaxAbsScalerModel Python docs](api/python/pyspark.ml.html#pyspark.ml.feature.MaxAbsScalerModel) for more details on the API. {% include_example python/ml/max_abs_scaler_example.py %} |