--- layout: global title: Machine Learning Library (MLlib) --- MLlib is a Spark implementation of some common machine learning algorithms and utilities, including classification, regression, clustering, collaborative filtering, dimensionality reduction, as well as underlying optimization primitives: * [Basics](mllib-basics.html) * data types * summary statistics * Classification and regression * [linear support vector machine (SVM)](mllib-linear-methods.html#linear-support-vector-machine-svm) * [logistic regression](mllib-linear-methods.html#logistic-regression) * [linear least squares, Lasso, and ridge regression](mllib-linear-methods.html#linear-least-squares-lasso-and-ridge-regression) * [decision tree](mllib-decision-tree.html) * [naive Bayes](mllib-naive-bayes.html) * [Collaborative filtering](mllib-collaborative-filtering.html) * alternating least squares (ALS) * [Clustering](mllib-clustering.html) * k-means * [Dimensionality reduction](mllib-dimensionality-reduction.html) * singular value decomposition (SVD) * principal component analysis (PCA) * [Optimization](mllib-optimization.html) * stochastic gradient descent * limited-memory BFGS (L-BFGS) MLlib is a new component under active development. The APIs marked `Experimental`/`DeveloperApi` may change in future releases, and we will provide migration guide between releases. # Dependencies MLlib uses linear algebra packages [Breeze](http://www.scalanlp.org/), which depends on [netlib-java](https://github.com/fommil/netlib-java), and [jblas](https://github.com/mikiobraun/jblas). `netlib-java` and `jblas` depend on native Fortran routines. You need to install the [gfortran runtime library](https://github.com/mikiobraun/jblas/wiki/Missing-Libraries) if it is not already present on your nodes. MLlib will throw a linking error if it cannot detect these libraries automatically. Due to license issues, we do not include `netlib-java`'s native libraries in MLlib's dependency set. If no native library is available at runtime, you will see a warning message. To use native libraries from `netlib-java`, please include artifact `com.github.fommil.netlib:all:1.1.2` as a dependency of your project or build your own (see [instructions](https://github.com/fommil/netlib-java/blob/master/README.md#machine-optimised-system-libraries)). To use MLlib in Python, you will need [NumPy](http://www.numpy.org) version 1.4 or newer. --- # Migration Guide ## From 0.9 to 1.0 In MLlib v1.0, we support both dense and sparse input in a unified way, which introduces a few breaking changes. If your data is sparse, please store it in a sparse format instead of dense to take advantage of sparsity in both storage and computation.