diff options
author | Matei Alexandru Zaharia <matei@apache.org> | 2013-08-23 22:17:13 +0000 |
---|---|---|
committer | Matei Alexandru Zaharia <matei@apache.org> | 2013-08-23 22:17:13 +0000 |
commit | f510ed111eeb92c389b807713fbcd4932c5cf802 (patch) | |
tree | 141918d5744e170623aec93fb4306a3cae840941 /site | |
parent | e89c82408dc971b175d7c402a9b631d67848e3cb (diff) | |
download | spark-website-f510ed111eeb92c389b807713fbcd4932c5cf802.tar.gz spark-website-f510ed111eeb92c389b807713fbcd4932c5cf802.tar.bz2 spark-website-f510ed111eeb92c389b807713fbcd4932c5cf802.zip |
Front page
Diffstat (limited to 'site')
-rw-r--r-- | site/index.html | 26 |
1 files changed, 19 insertions, 7 deletions
diff --git a/site/index.html b/site/index.html index 03dd14f64..ea2145a6f 100644 --- a/site/index.html +++ b/site/index.html @@ -105,17 +105,29 @@ <article class="page type-page status-publish hentry"> <h2 id="what-is-apache-spark">What is Apache Spark?</h2> -<p>Apache Spark is an open source cluster computing system that aims to make data analytics <em>fast</em> — both fast to run and fast to write. -To run programs faster, Spark provides primitives for in-memory cluster computing: your job can load data into memory and query it repeatedly much more quickly than with disk-based systems like Hadoop MapReduce. -To make programming faster, Spark provides clean, concise APIs in <a href="http://www.scala-lang.org" onclick="javascript:_gaq.push(['_trackEvent','outbound-article','http://www.scala-lang.org']);">Scala</a>, <a href="/docs/latest/quick-start.html#a-standalone-job-in-java">Java</a> and <a href="/docs/latest/quick-start.html#a-standalone-job-in-python">Python</a>. You can also use Spark interactively from the Scala and Python shells to rapidly query big datasets.</p> + +<p>Apache Spark is an open source cluster computing system that aims to make data analytics <em>fast</em> — both fast to run and fast to write.</p> + +<p>To run programs faster, Spark offers a general execution model that can optimize arbitrary operator graphs, and supports in-memory computing, which lets it query data faster than disk-based engines like Hadoop.</p> + +<p>To make programming faster, Spark provides clean, concise APIs in +<a href="/docs/latest/quick-start.html#a-standalone-job-in-python">Python</a>, +<a href="http://www.scala-lang.org" onclick="javascript:_gaq.push(['_trackEvent','outbound-article','http://www.scala-lang.org']);">Scala</a> and +<a href="/docs/latest/quick-start.html#a-standalone-job-in-java">Java</a>. +You can also use Spark interactively from the Scala and Python shells to rapidly query big datasets.</p> <h2 id="what-can-it-do">What can it do?</h2> -<p>Spark was initially developed for two applications where keeping data in memory helps: <em>iterative</em> algorithms, which are common in machine learning, and <em>interactive</em> data mining. In both cases, Spark can run up to <b>100x</b> faster than Hadoop MapReduce. However, you can use Spark for general data processing too. Check out our <a href="/examples.html">example jobs</a>. -Spark is also the engine behind <a href="http://shark.cs.berkeley.edu" onclick="javascript:_gaq.push(['_trackEvent','outbound-article','http://shark.cs.berkeley.edu']);">Shark</a>, a fully <a href="http://hive.apache.org" onclick="javascript:_gaq.push(['_trackEvent','outbound-article','http://hive.apache.org']);">Apache Hive</a>-compatible data warehousing system that can run 100x faster than Hive. -While Spark is a new engine, it can access any data source supported by Hadoop, making it easy to run over existing data.</p> + +<p>Spark was initially developed for two applications where placing data in memory helps: <em>iterative</em> algorithms, which are common in machine learning, and <em>interactive</em> data mining. In both cases, Spark can run up to <b>100x</b> faster than Hadoop MapReduce. However, you can use Spark for general data processing too. Check out our <a href="/examples.html">example jobs</a>.</p> + +<p>Spark is also the engine behind <a href="http://shark.cs.berkeley.edu" onclick="javascript:_gaq.push(['_trackEvent','outbound-article','http://shark.cs.berkeley.edu']);">Shark</a>, a fully <a href="http://hive.apache.org" onclick="javascript:_gaq.push(['_trackEvent','outbound-article','http://hive.apache.org']);">Apache Hive</a>-compatible data warehousing system that can run 100x faster than Hive.</p> + +<p>While Spark is a new engine, it can access any data source supported by Hadoop, making it easy to run over existing data.</p> <h2 id="who-uses-it">Who uses it?</h2> -<p>Spark was developed in the <a href="https://amplab.cs.berkeley.edu" onclick="javascript:_gaq.push(['_trackEvent','outbound-article','http://amplab.cs.berkeley.edu']);">UC Berkeley AMPLab</a>. It’s used by several groups of researchers at Berkeley to run large-scale applications such as spam filtering and traffic prediction. It’s also used to accelerate data analytics at <a href="http://www.yahoo.com" onclick="javascript:_gaq.push(['_trackEvent','outbound-article','http://www.yahoo.com']);">Yahoo!</a>, <a href="http://www.conviva.com" onclick="javascript:_gaq.push(['_trackEvent','outbound-article','http://www.conviva.com']);">Conviva</a>, <a href="http://www.quantifind.com" onclick="javascript:_gaq.push(['_trackEvent','outbound-article','http://www.quantifind.com']);">Quantifind</a>, and other companies — in total, 17 companies have contributed to Spark! Spark is <a href="https://github.com/mesos/spark" onclick="javascript:_gaq.push(['_trackEvent','outbound-article','http://github.com']);">open source</a> under a BSD license, so <a href="/downloads.html">download</a> it to check it out.</p> +<p>Spark was initially developed in the <a href="https://amplab.cs.berkeley.edu" onclick="javascript:_gaq.push(['_trackEvent','outbound-article','http://amplab.cs.berkeley.edu']);">UC Berkeley AMPLab</a>, but is now being used and developed at a wide array of companies, including <a href="http://www.yahoo.com" onclick="javascript:_gaq.push(['_trackEvent','outbound-article','http://www.yahoo.com']);">Yahoo!</a>, <a href="http://www.conviva.com" onclick="javascript:_gaq.push(['_trackEvent','outbound-article','http://www.conviva.com']);">Conviva</a>, and <a href="http://www.quantifind.com" onclick="javascript:_gaq.push(['_trackEvent','outbound-article','http://www.quantifind.com']);">Quantifind</a>. +In total, over 20 companies have contributed code to Spark. +Spark is <a href="https://github.com/mesos/spark" onclick="javascript:_gaq.push(['_trackEvent','outbound-article','http://github.com']);">open source</a> under an Apache license, so <a href="/downloads.html">download</a> it to check it out.</p> <h2 id="apache-incubator-notice">Apache Incubator notice</h2> <p>Apache Spark is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.</p> |