aboutsummaryrefslogtreecommitdiff
path: root/docs/mllib-optimization.md
diff options
context:
space:
mode:
authorMartin Jaggi <m.jaggi@gmail.com>2014-02-08 11:39:13 -0800
committerPatrick Wendell <pwendell@gmail.com>2014-02-08 11:39:13 -0800
commitfabf1749995103841e6a3975892572f376ee48d0 (patch)
treea9c03486cce6cc4f390405f33266a31861ebe3d4 /docs/mllib-optimization.md
parent3a9d82cc9e85accb5c1577cf4718aa44c8d5038c (diff)
downloadspark-fabf1749995103841e6a3975892572f376ee48d0.tar.gz
spark-fabf1749995103841e6a3975892572f376ee48d0.tar.bz2
spark-fabf1749995103841e6a3975892572f376ee48d0.zip
Merge pull request #552 from martinjaggi/master. Closes #552.
tex formulas in the documentation using mathjax. and spliting the MLlib documentation by techniques see jira https://spark-project.atlassian.net/browse/MLLIB-19 and https://github.com/shivaram/spark/compare/mathjax Author: Martin Jaggi <m.jaggi@gmail.com> == Merge branch commits == commit 0364bfabbfc347f917216057a20c39b631842481 Author: Martin Jaggi <m.jaggi@gmail.com> Date: Fri Feb 7 03:19:38 2014 +0100 minor polishing, as suggested by @pwendell commit dcd2142c164b2f602bf472bb152ad55bae82d31a Author: Martin Jaggi <m.jaggi@gmail.com> Date: Thu Feb 6 18:04:26 2014 +0100 enabling inline latex formulas with $.$ same mathjax configuration as used in math.stackexchange.com sample usage in the linear algebra (SVD) documentation commit bbafafd2b497a5acaa03a140bb9de1fbb7d67ffa Author: Martin Jaggi <m.jaggi@gmail.com> Date: Thu Feb 6 17:31:29 2014 +0100 split MLlib documentation by techniques and linked from the main mllib-guide.md site commit d1c5212b93c67436543c2d8ddbbf610fdf0a26eb Author: Martin Jaggi <m.jaggi@gmail.com> Date: Thu Feb 6 16:59:43 2014 +0100 enable mathjax formula in the .md documentation files code by @shivaram commit d73948db0d9bc36296054e79fec5b1a657b4eab4 Author: Martin Jaggi <m.jaggi@gmail.com> Date: Thu Feb 6 16:57:23 2014 +0100 minor update on how to compile the documentation
Diffstat (limited to 'docs/mllib-optimization.md')
-rw-r--r--docs/mllib-optimization.md40
1 files changed, 40 insertions, 0 deletions
diff --git a/docs/mllib-optimization.md b/docs/mllib-optimization.md
new file mode 100644
index 0000000000..428284ef29
--- /dev/null
+++ b/docs/mllib-optimization.md
@@ -0,0 +1,40 @@
+---
+layout: global
+title: MLlib - Optimization
+---
+
+* Table of contents
+{:toc}
+
+
+# Gradient Descent Primitive
+
+[Gradient descent](http://en.wikipedia.org/wiki/Gradient_descent) (along with
+stochastic variants thereof) are first-order optimization methods that are
+well-suited for large-scale and distributed computation. Gradient descent
+methods aim to find a local minimum of a function by iteratively taking steps
+in the direction of the negative gradient of the function at the current point,
+i.e., the current parameter value. Gradient descent is included as a low-level
+primitive in MLlib, upon which various ML algorithms are developed, and has the
+following parameters:
+
+* *gradient* is a class that computes the stochastic gradient of the function
+being optimized, i.e., with respect to a single training example, at the
+current parameter value. MLlib includes gradient classes for common loss
+functions, e.g., hinge, logistic, least-squares. The gradient class takes as
+input a training example, its label, and the current parameter value.
+* *updater* is a class that updates weights in each iteration of gradient
+descent. MLlib includes updaters for cases without regularization, as well as
+L1 and L2 regularizers.
+* *stepSize* is a scalar value denoting the initial step size for gradient
+descent. All updaters in MLlib use a step size at the t-th step equal to
+stepSize / sqrt(t).
+* *numIterations* is the number of iterations to run.
+* *regParam* is the regularization parameter when using L1 or L2 regularization.
+* *miniBatchFraction* is the fraction of the data used to compute the gradient
+at each iteration.
+
+Available algorithms for gradient descent:
+
+* [GradientDescent](api/mllib/index.html#org.apache.spark.mllib.optimization.GradientDescent)
+