diff options
Diffstat (limited to 'docs/mllib-classification-regression.md')
-rw-r--r-- | docs/mllib-classification-regression.md | 4 |
1 files changed, 2 insertions, 2 deletions
diff --git a/docs/mllib-classification-regression.md b/docs/mllib-classification-regression.md index 18a3e8e075..d5bd8042ca 100644 --- a/docs/mllib-classification-regression.md +++ b/docs/mllib-classification-regression.md @@ -77,8 +77,8 @@ between the two goals of small loss and small model complexity. **Distributed Datasets.** For all currently implemented optimization methods for classification, the data must be -distributed between the worker machines *by examples*. Every machine holds a consecutive block of -the `$n$` example/label pairs `$(\x_i,y_i)$`. +distributed between processes on the worker machines *by examples*. Machines hold consecutive +blocks of the `$n$` example/label pairs `$(\x_i,y_i)$`. In other words, the input distributed dataset ([RDD](scala-programming-guide.html#resilient-distributed-datasets-rdds)) must be the set of vectors `$\x_i\in\R^d$`. |