aboutsummaryrefslogtreecommitdiff
path: root/docs/mllib-classification-regression.md
diff options
context:
space:
mode:
Diffstat (limited to 'docs/mllib-classification-regression.md')
-rw-r--r--docs/mllib-classification-regression.md4
1 files changed, 2 insertions, 2 deletions
diff --git a/docs/mllib-classification-regression.md b/docs/mllib-classification-regression.md
index 18a3e8e075..d5bd8042ca 100644
--- a/docs/mllib-classification-regression.md
+++ b/docs/mllib-classification-regression.md
@@ -77,8 +77,8 @@ between the two goals of small loss and small model complexity.
**Distributed Datasets.**
For all currently implemented optimization methods for classification, the data must be
-distributed between the worker machines *by examples*. Every machine holds a consecutive block of
-the `$n$` example/label pairs `$(\x_i,y_i)$`.
+distributed between processes on the worker machines *by examples*. Machines hold consecutive
+blocks of the `$n$` example/label pairs `$(\x_i,y_i)$`.
In other words, the input distributed dataset
([RDD](scala-programming-guide.html#resilient-distributed-datasets-rdds)) must be the set of
vectors `$\x_i\in\R^d$`.