From eee58685c39269c191a921c39f1520c747a42318 Mon Sep 17 00:00:00 2001 From: Xin Ren Date: Fri, 16 Sep 2016 16:31:23 -0700 Subject: replace with valid url to rdd paper --- research.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/research.md b/research.md index 41841a1c7..ec7dd54d8 100644 --- a/research.md +++ b/research.md @@ -27,7 +27,7 @@ Traditional MapReduce and DAG engines are suboptimal for these applications beca

-Spark offers an abstraction called resilient distributed datasets (RDDs) to support these applications efficiently. RDDs can be stored in memory between queries without requiring replication. Instead, they rebuild lost data on failure using lineage: each RDD remembers how it was built from other datasets (by transformations like map, join or groupBy) to rebuild itself. RDDs allow Spark to outperform existing models by up to 100x in multi-pass analytics. We showed that RDDs can support a wide variety of iterative algorithms, as well as interactive data mining and a highly efficient SQL engine (Shark). +Spark offers an abstraction called resilient distributed datasets (RDDs) to support these applications efficiently. RDDs can be stored in memory between queries without requiring replication. Instead, they rebuild lost data on failure using lineage: each RDD remembers how it was built from other datasets (by transformations like map, join or groupBy) to rebuild itself. RDDs allow Spark to outperform existing models by up to 100x in multi-pass analytics. We showed that RDDs can support a wide variety of iterative algorithms, as well as interactive data mining and a highly efficient SQL engine (Shark).

You can find more about the research behind Spark in the following papers:

-- cgit v1.2.3