From 673dcddb721241a6d7eef2d773a170a1e1a38202 Mon Sep 17 00:00:00 2001 From: Matei Alexandru Zaharia Date: Wed, 22 Jan 2014 20:33:24 +0000 Subject: Update site look and add pages for Streaming and MLlib This monster commit does a variety of things: - Update the site look and feel to be cleaner - Add top-level points to front page - Add a listing of related projects, and pages for those included in Spark - Reorganize docs and community pages - Make sure the site scales properly on mobile devices - Add tabs to let users view the examples in any programming language It's just a start, but should be a step towards a better web presence. --- mllib/index.md | 141 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 141 insertions(+) create mode 100644 mllib/index.md (limited to 'mllib/index.md') diff --git a/mllib/index.md b/mllib/index.md new file mode 100644 index 000000000..694fa544f --- /dev/null +++ b/mllib/index.md @@ -0,0 +1,141 @@ +--- +layout: global +type: "page singular" +title: MLlib +subproject: MLlib +--- + +
+ MLlib is Apache Spark's scalable machine learning library. +
+ +
+
+

Ease of Use

+

+ Usable in Java, Scala and Python. +

+

+ MLlib fits into Spark's + APIs and interoperates with NumPy in Python (starting in Spark 0.9). + You can use any Hadoop data source (e.g. HDFS, HBase, or local files), making it + easy to plug into Hadoop workflows. +

+
+
+ +
+
+ points = spark.textFile("hdfs://...")
+               .map(parsePoint)
+
+ model = KMeans.train(points) +
+
Calling MLlib in Scala
+
+
+
+ +
+
+

Performance

+

+ High-quality algorithms, 100x faster than MapReduce. +

+

+ Spark excels at iterative computation, enabling MLlib to run fast. + At the same time, we care about algorithmic performance: + MLlib contains high-quality algorithms that leverage iteration, and + can yield better results than the one-pass approximations sometimes used on MapReduce. +

+
+
+
+ +
Logistic regression in Hadoop and Spark
+
+
+
+ +
+
+

Easy to Deploy

+

+ Runs on existing Hadoop clusters and data. +

+

+ If you have a Hadoop 2 cluster, you can run Spark and MLlib without any pre-installation. + Otherwise, Spark is easy to run standalone + or on EC2 or Mesos. + You can read from HDFS, HBase, or any Hadoop data source. +

+
+
+ +
+
+ +{% extra %} + + +
+
+

Algorithms

+

+ MLlib 0.8.1 contains the following algorithms: +

+ +

Refer to the MLlib guide for usage examples.

+
+ +
+

Community

+

+ MLlib is developed as part of the Apache Spark project. It thus gets + tested and updated with each Spark release. +

+

+ If you have questions about the library, ask on the + Spark mailing lists. +

+

+ MLlib is still a young project and welcomes contributions. If you'd like to submit an algorithm to MLlib, + read how to + contribute to Spark and send us a patch! +

+
+ +
+

Getting Started

+

+ To get started with MLlib: +

+
    +
  • Download Spark. MLlib is included as a module.
  • +
  • Read the MLlib guide, which includes + various usage examples.
  • +
  • Learn how to deploy Spark on a cluster + if you'd like to run in distributed mode. You can also run locally on a multicore machine + without any setup. +
  • +
+
+
+ +
+ +
+ +{% endextra %} -- cgit v1.2.3