diff options
author | Sean Owen <sowen@cloudera.com> | 2016-09-21 08:32:16 +0100 |
---|---|---|
committer | Sean Owen <sowen@cloudera.com> | 2016-09-21 08:32:16 +0100 |
commit | 7c96b646eb2de2dbe6aec91a82d86699e13c59c5 (patch) | |
tree | dc754aa5d986afb7aa4985da94a7c6c4fb8da7ab /site/research.html | |
parent | eee58685c39269c191a921c39f1520c747a42318 (diff) | |
download | spark-website-7c96b646eb2de2dbe6aec91a82d86699e13c59c5.tar.gz spark-website-7c96b646eb2de2dbe6aec91a82d86699e13c59c5.tar.bz2 spark-website-7c96b646eb2de2dbe6aec91a82d86699e13c59c5.zip |
Add Israel Spark meetup to community page per request. Use https for meetup while we're here. Pick up a recent change to paper hyperlink reflected only in markdown, not HTML
Diffstat (limited to 'site/research.html')
-rw-r--r-- | site/research.html | 2 |
1 files changed, 1 insertions, 1 deletions
diff --git a/site/research.html b/site/research.html index 73bd0ba71..42754c49b 100644 --- a/site/research.html +++ b/site/research.html @@ -204,7 +204,7 @@ Traditional MapReduce and DAG engines are suboptimal for these applications beca </p> <p> -Spark offers an abstraction called <a href="http://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf"><em>resilient distributed datasets (RDDs)</em></a> to support these applications efficiently. RDDs can be stored in memory between queries <em>without</em> requiring replication. Instead, they rebuild lost data on failure using <em>lineage</em>: each RDD remembers how it was built from other datasets (by transformations like <code>map</code>, <code>join</code> or <code>groupBy</code>) to rebuild itself. RDDs allow Spark to outperform existing models by up to 100x in multi-pass analytics. We showed that RDDs can support a wide variety of iterative algorithms, as well as interactive data mining and a highly efficient SQL engine (<a href="http://shark.cs.berkeley.edu">Shark</a>). +Spark offers an abstraction called <a href="http://people.csail.mit.edu/matei/papers/2012/nsdi_spark.pdf"><em>resilient distributed datasets (RDDs)</em></a> to support these applications efficiently. RDDs can be stored in memory between queries <em>without</em> requiring replication. Instead, they rebuild lost data on failure using <em>lineage</em>: each RDD remembers how it was built from other datasets (by transformations like <code>map</code>, <code>join</code> or <code>groupBy</code>) to rebuild itself. RDDs allow Spark to outperform existing models by up to 100x in multi-pass analytics. We showed that RDDs can support a wide variety of iterative algorithms, as well as interactive data mining and a highly efficient SQL engine (<a href="http://shark.cs.berkeley.edu">Shark</a>). </p> <p class="noskip">You can find more about the research behind Spark in the following papers:</p> |