aboutsummaryrefslogtreecommitdiff
path: root/docs/ml-features.md
diff options
context:
space:
mode:
authorDongjoon Hyun <dongjoon@apache.org>2016-02-22 09:52:07 +0000
committerSean Owen <sowen@cloudera.com>2016-02-22 09:52:07 +0000
commit024482bf51e8158eed08a7dc0758f585baf86e1f (patch)
treee51f2c53b027178bb4e485d2781e266d96ff6e3d /docs/ml-features.md
parent1b144455b620861d8cc790d3fc69902717f14524 (diff)
downloadspark-024482bf51e8158eed08a7dc0758f585baf86e1f.tar.gz
spark-024482bf51e8158eed08a7dc0758f585baf86e1f.tar.bz2
spark-024482bf51e8158eed08a7dc0758f585baf86e1f.zip
[MINOR][DOCS] Fix all typos in markdown files of `doc` and similar patterns in other comments
## What changes were proposed in this pull request? This PR tries to fix all typos in all markdown files under `docs` module, and fixes similar typos in other comments, too. ## How was the this patch tested? manual tests. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11300 from dongjoon-hyun/minor_fix_typos.
Diffstat (limited to 'docs/ml-features.md')
-rw-r--r--docs/ml-features.md6
1 files changed, 3 insertions, 3 deletions
diff --git a/docs/ml-features.md b/docs/ml-features.md
index 5809f65d63..68d3ea2971 100644
--- a/docs/ml-features.md
+++ b/docs/ml-features.md
@@ -185,7 +185,7 @@ for more details on the API.
<div data-lang="python" markdown="1">
Refer to the [Tokenizer Python docs](api/python/pyspark.ml.html#pyspark.ml.feature.Tokenizer) and
-the the [RegexTokenizer Python docs](api/python/pyspark.ml.html#pyspark.ml.feature.RegexTokenizer)
+the [RegexTokenizer Python docs](api/python/pyspark.ml.html#pyspark.ml.feature.RegexTokenizer)
for more details on the API.
{% include_example python/ml/tokenizer_example.py %}
@@ -459,7 +459,7 @@ column, we should get the following:
"a" gets index `0` because it is the most frequent, followed by "c" with index `1` and "b" with
index `2`.
-Additionaly, there are two strategies regarding how `StringIndexer` will handle
+Additionally, there are two strategies regarding how `StringIndexer` will handle
unseen labels when you have fit a `StringIndexer` on one dataset and then use it
to transform another:
@@ -779,7 +779,7 @@ for more details on the API.
* `splits`: Parameter for mapping continuous features into buckets. With n+1 splits, there are n buckets. A bucket defined by splits x,y holds values in the range [x,y) except the last bucket, which also includes y. Splits should be strictly increasing. Values at -inf, inf must be explicitly provided to cover all Double values; Otherwise, values outside the splits specified will be treated as errors. Two examples of `splits` are `Array(Double.NegativeInfinity, 0.0, 1.0, Double.PositiveInfinity)` and `Array(0.0, 1.0, 2.0)`.
-Note that if you have no idea of the upper bound and lower bound of the targeted column, you would better add the `Double.NegativeInfinity` and `Double.PositiveInfinity` as the bounds of your splits to prevent a potenial out of Bucketizer bounds exception.
+Note that if you have no idea of the upper bound and lower bound of the targeted column, you would better add the `Double.NegativeInfinity` and `Double.PositiveInfinity` as the bounds of your splits to prevent a potential out of Bucketizer bounds exception.
Note also that the splits that you provided have to be in strictly increasing order, i.e. `s0 < s1 < s2 < ... < sn`.