Update site look and add pages for Streaming and MLlib

This monster commit does a variety of things: - Update the site look and feel to be cleaner - Add top-level points to front page - Add a listing of related projects, and pages for those included in Spark - Reorganize docs and community pages - Make sure the site scales properly on mobile devices - Add tabs to let users view the examples in any programming language It's just a start, but should be a step towards a better web presence.
author: Matei Alexandru Zaharia <matei@apache.org> 2014-01-22 20:33:24 +0000
committer: Matei Alexandru Zaharia <matei@apache.org> 2014-01-22 20:33:24 +0000
commit: 673dcddb721241a6d7eef2d773a170a1e1a38202 (patch)
tree: 95e99582a87f471bea589487965b639323a0e05d /mllib/index.md
parent: e42e6e2bef38ca1d6fb92c27a7556f30be940574 (diff)
download: spark-website-673dcddb721241a6d7eef2d773a170a1e1a38202.tar.gz
spark-website-673dcddb721241a6d7eef2d773a170a1e1a38202.tar.bz2
spark-website-673dcddb721241a6d7eef2d773a170a1e1a38202.zip
1 files changed, 141 insertions, 0 deletions
diff --git a/mllib/index.md b/mllib/index.md
new file mode 100644
index 000000000..694fa544f
--- /dev/null
+++ b/mllib/index.md
@@ -0,0 +1,141 @@
+---
+layout: global
+type: "page singular"
+title: MLlib
+subproject: MLlib
+---
+
+<div class="jumbotron">
+  <b>MLlib</b> is Apache Spark's scalable machine learning library.
+</div>
+
+<div class="row row-padded">
+  <div class="col-md-7 col-sm-7">
+    <h2>Ease of Use</h2>
+    <p class="lead">
+      Usable in Java, Scala and Python.
+    </p>
+    <p>
+      MLlib fits into <a href="{{site.url}}">Spark</a>'s
+      APIs and interoperates with <a href="http://www.numpy.org">NumPy</a> in Python (starting in Spark 0.9).
+      You can use any Hadoop data source (e.g. HDFS, HBase, or local files), making it
+      easy to plug into Hadoop workflows.
+    </p>
+  </div>
+  <div class="col-md-5 col-sm-5 col-padded-top col-center">
+
+    <div style="margin-top: 15px; text-align: left; display: inline-block;">
+      <div class="code">
+        points = spark.textFile(<span class="string">"hdfs://..."</span>)<br/>
+        &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.<span class="sparkop">map</span>(<span class="closure">parsePoint</span>)<br/>
+        <br/>
+        model = KMeans.<span class="sparkop">train</span>(points)
+      </div>
+      <div class="caption">Calling MLlib in Scala</div>
+    </div>
+  </div>
+</div>
+
+<div class="row row-padded">
+  <div class="col-md-7 col-sm-7">
+    <h2>Performance</h2>
+    <p class="lead">
+      High-quality algorithms, 100x faster than MapReduce.
+    </p>
+    <p>
+      Spark excels at iterative computation, enabling MLlib to run fast.
+      At the same time, we care about algorithmic performance:
+      MLlib contains high-quality algorithms that leverage iteration, and
+      can yield better results than the one-pass approximations sometimes used on MapReduce.
+    </p>
+  </div>
+  <div class="col-md-5 col-sm-5 col-padded-top col-center">
+    <div style="width: 100%; max-width: 272px; display: inline-block; text-align: center;">
+      <img src="{{site.url}}images/logistic-regression.png" style="width: 100%; max-width: 250px;">
+      <div class="caption" style="min-width: 272px;">Logistic regression in Hadoop and Spark</div>
+    </div>
+  </div>
+</div>
+
+<div class="row row-padded" style="margin-bottom: 15px;">
+  <div class="col-md-7 col-sm-7">
+    <h2>Easy to Deploy</h2>
+    <p class="lead">
+      Runs on existing Hadoop clusters and data.
+    </p>
+    <p>
+      If you have a Hadoop 2 cluster, you can run Spark and MLlib without any pre-installation.
+      Otherwise, Spark is easy to run <a href="{{site.url}}docs/latest/spark-standalone.html">standalone</a>
+      or on <a href="{{site.url}}docs/latest/ec2-scripts.html">EC2</a> or <a href="http://mesos.apache.org">Mesos</a>.
+      You can read from <a href="http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html">HDFS</a>, <a href="http://hbase.apache.org">HBase</a>, or any Hadoop data source.
+    </p>
+  </div>
+  <div class="col-md-5 col-sm-5 col-padded-top col-center">
+    <img src="{{site.url}}images/hadoop.jpg" style="width: 100%; max-width: 280px;">
+  </div>
+</div>
+
+{% extra %}
+
+
+<div class="row">
+  <div class="col-md-4 col-padded">
+    <h3>Algorithms</h3>
+    <p>
+      MLlib 0.8.1 contains the following algorithms:
+    </p>
+    <ul class="list-narrow">
+      <li>K-means clustering with <a href="http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf">K-means|| initialization</a>.</li>
+      <li>L<sub>1</sub>- and L<sub>2</sub>-regularized <a href="http://en.wikipedia.org/wiki/Linear_regression">linear regression</a>.</li>
+      <li>L<sub>1</sub>- and L<sub>2</sub>-regularized <a href="http://en.wikipedia.org/wiki/Logistic_regression">logistic regression</a>.</li>
+      <li><a href="http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08(submitted).pdf">Alternating least squares</a> collaborative filtering, with explicit
+      ratings or <a href="http://www2.research.att.com/~yifanhu/PUB/cf.pdf">implicit feedback</a>.</li>
+      <!--<li><a href="http://en.wikipedia.org/wiki/Naive_Bayes_classifier">Naive Bayes</a> multinomial classification.</li>-->
+      <li>Stochastic gradient descent.</li>
+    </ul>
+    <p>Refer to the <a href="{{site.url}}docs/latest/mllib-guide.html">MLlib guide</a> for usage examples.</p>
+  </div>
+
+  <div class="col-md-4 col-padded">
+    <h3>Community</h3>
+    <p>
+      MLlib is developed as part of the Apache Spark project. It thus gets
+      tested and updated with each Spark release.
+    </p>
+    <p>
+      If you have questions about the library, ask on the
+      <a href="{{site.url}}community.html#mailing-lists">Spark mailing lists</a>.
+    </p>
+    <p>
+      MLlib is still a young project and welcomes contributions. If you'd like to submit an algorithm to MLlib,
+      read <a href="https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark">how to
+      contribute to Spark</a> and send us a patch!
+    </p>
+  </div>
+
+  <div class="col-md-4 col-padded">
+    <h3>Getting Started</h3>
+    <p>
+      To get started with MLlib:
+    </p>
+    <ul class="list-narrow">
+      <li><a href="{{site.url}}downloads.html">Download Spark</a>. MLlib is included as a module.</li>
+      <li>Read the <a href="{{site.url}}docs/latest/mllib-guide.html">MLlib guide</a>, which includes
+      various usage examples.</li>
+      <li>Learn how to <a href="{{site.url}}docs/latest/#launching-on-a-cluster">deploy</a> Spark on a cluster
+        if you'd like to run in distributed mode. You can also run locally on a multicore machine
+        without any setup.
+      </li>
+    </ul>
+  </div>
+</div>
+
+<div class="row">
+  <div class="col-sm-12 col-center">
+    <a href="{{site.url}}downloads.html" class="btn btn-success btn-lg btn-multiline">
+      Download Spark<br/><span class="small">Includes MLlib</span>
+    </a>
+  </div>
+</div>
+
+{% endextra %}
author	Matei Alexandru Zaharia <matei@apache.org>	2014-01-22 20:33:24 +0000
committer	Matei Alexandru Zaharia <matei@apache.org>	2014-01-22 20:33:24 +0000
commit	673dcddb721241a6d7eef2d773a170a1e1a38202 (patch)
tree	95e99582a87f471bea589487965b639323a0e05d /mllib/index.md
parent	e42e6e2bef38ca1d6fb92c27a7556f30be940574 (diff)
download	spark-website-673dcddb721241a6d7eef2d773a170a1e1a38202.tar.gz spark-website-673dcddb721241a6d7eef2d773a170a1e1a38202.tar.bz2 spark-website-673dcddb721241a6d7eef2d773a170a1e1a38202.zip