diff options
author | Dongjoon Hyun <dongjoon@apache.org> | 2016-02-22 09:52:07 +0000 |
---|---|---|
committer | Sean Owen <sowen@cloudera.com> | 2016-02-22 09:52:07 +0000 |
commit | 024482bf51e8158eed08a7dc0758f585baf86e1f (patch) | |
tree | e51f2c53b027178bb4e485d2781e266d96ff6e3d /docs/ml-features.md | |
parent | 1b144455b620861d8cc790d3fc69902717f14524 (diff) | |
download | spark-024482bf51e8158eed08a7dc0758f585baf86e1f.tar.gz spark-024482bf51e8158eed08a7dc0758f585baf86e1f.tar.bz2 spark-024482bf51e8158eed08a7dc0758f585baf86e1f.zip |
[MINOR][DOCS] Fix all typos in markdown files of `doc` and similar patterns in other comments
## What changes were proposed in this pull request?
This PR tries to fix all typos in all markdown files under `docs` module,
and fixes similar typos in other comments, too.
## How was the this patch tested?
manual tests.
Author: Dongjoon Hyun <dongjoon@apache.org>
Closes #11300 from dongjoon-hyun/minor_fix_typos.
Diffstat (limited to 'docs/ml-features.md')
-rw-r--r-- | docs/ml-features.md | 6 |
1 files changed, 3 insertions, 3 deletions
diff --git a/docs/ml-features.md b/docs/ml-features.md index 5809f65d63..68d3ea2971 100644 --- a/docs/ml-features.md +++ b/docs/ml-features.md @@ -185,7 +185,7 @@ for more details on the API. <div data-lang="python" markdown="1"> Refer to the [Tokenizer Python docs](api/python/pyspark.ml.html#pyspark.ml.feature.Tokenizer) and -the the [RegexTokenizer Python docs](api/python/pyspark.ml.html#pyspark.ml.feature.RegexTokenizer) +the [RegexTokenizer Python docs](api/python/pyspark.ml.html#pyspark.ml.feature.RegexTokenizer) for more details on the API. {% include_example python/ml/tokenizer_example.py %} @@ -459,7 +459,7 @@ column, we should get the following: "a" gets index `0` because it is the most frequent, followed by "c" with index `1` and "b" with index `2`. -Additionaly, there are two strategies regarding how `StringIndexer` will handle +Additionally, there are two strategies regarding how `StringIndexer` will handle unseen labels when you have fit a `StringIndexer` on one dataset and then use it to transform another: @@ -779,7 +779,7 @@ for more details on the API. * `splits`: Parameter for mapping continuous features into buckets. With n+1 splits, there are n buckets. A bucket defined by splits x,y holds values in the range [x,y) except the last bucket, which also includes y. Splits should be strictly increasing. Values at -inf, inf must be explicitly provided to cover all Double values; Otherwise, values outside the splits specified will be treated as errors. Two examples of `splits` are `Array(Double.NegativeInfinity, 0.0, 1.0, Double.PositiveInfinity)` and `Array(0.0, 1.0, 2.0)`. -Note that if you have no idea of the upper bound and lower bound of the targeted column, you would better add the `Double.NegativeInfinity` and `Double.PositiveInfinity` as the bounds of your splits to prevent a potenial out of Bucketizer bounds exception. +Note that if you have no idea of the upper bound and lower bound of the targeted column, you would better add the `Double.NegativeInfinity` and `Double.PositiveInfinity` as the bounds of your splits to prevent a potential out of Bucketizer bounds exception. Note also that the splits that you provided have to be in strictly increasing order, i.e. `s0 < s1 < s2 < ... < sn`. |