diff options
Diffstat (limited to 'docs/mllib-optimization.md')
-rw-r--r-- | docs/mllib-optimization.md | 25 |
1 files changed, 12 insertions, 13 deletions
diff --git a/docs/mllib-optimization.md b/docs/mllib-optimization.md index c79cc3d944..bec3912b55 100644 --- a/docs/mllib-optimization.md +++ b/docs/mllib-optimization.md @@ -1,6 +1,6 @@ --- layout: global -title: MLlib - Optimization +title: <a href="mllib-guide.html">MLlib</a> - Optimization --- * Table of contents @@ -25,9 +25,10 @@ title: MLlib - Optimization -# Mathematical Description +## Mathematical description + +### Gradient descent -## (Sub)Gradient Descent The simplest method to solve optimization problems of the form `$\min_{\wv \in\R^d} \; f(\wv)$` is [gradient descent](http://en.wikipedia.org/wiki/Gradient_descent). Such first-order optimization methods (including gradient descent and stochastic variants @@ -38,14 +39,14 @@ the direction of steepest descent, which is the negative of the derivative (call [gradient](http://en.wikipedia.org/wiki/Gradient)) of the function at the current point, i.e., at the current parameter value. If the objective function `$f$` is not differentiable at all arguments, but still convex, then a -*subgradient* +*sub-gradient* is the natural generalization of the gradient, and assumes the role of the step direction. -In any case, computing a gradient or subgradient of `$f$` is expensive --- it requires a full +In any case, computing a gradient or sub-gradient of `$f$` is expensive --- it requires a full pass through the complete dataset, in order to compute the contributions from all loss terms. -## Stochastic (Sub)Gradient Descent (SGD) +### Stochastic gradient descent (SGD) Optimization problems whose objective function `$f$` is written as a sum are particularly -suitable to be solved using *stochastic subgradient descent (SGD)*. +suitable to be solved using *stochastic gradient descent (SGD)*. In our case, for the optimization formulations commonly used in <a href="mllib-classification-regression.html">supervised machine learning</a>, `\begin{equation} @@ -98,7 +99,7 @@ For the L1-regularizer, the proximal operator is given by soft thresholding, as [L1Updater](api/scala/index.html#org.apache.spark.mllib.optimization.L1Updater). -## Update Schemes for Distributed SGD +### Update schemes for distributed SGD The SGD implementation in [GradientDescent](api/scala/index.html#org.apache.spark.mllib.optimization.GradientDescent) uses a simple (distributed) sampling of the data examples. @@ -129,12 +130,12 @@ point. -# Implementation in MLlib +## Implementation in MLlib Gradient descent methods including stochastic subgradient descent (SGD) as included as a low-level primitive in `MLlib`, upon which various ML algorithms are developed, see the -<a href="mllib-classification-regression.html">classification and regression</a> +<a href="mllib-linear-methods.html">linear methods</a> section for example. The SGD method @@ -161,6 +162,4 @@ each iteration, to compute the gradient direction. Available algorithms for gradient descent: -* [GradientDescent.runMiniBatchSGD](api/scala/index.html#org.apache.spark.mllib.optimization.GradientDescent) - - +* [GradientDescent.runMiniBatchSGD](api/mllib/index.html#org.apache.spark.mllib.optimization.GradientDescent) |