summaryrefslogtreecommitdiff
path: root/faq.md
diff options
context:
space:
mode:
authorMatei Alexandru Zaharia <matei@apache.org>2016-05-19 00:14:26 +0000
committerMatei Alexandru Zaharia <matei@apache.org>2016-05-19 00:14:26 +0000
commit7aa376d72e651ff2ca1d9b5520d381fc64ea614a (patch)
treeb24520e908b7c2232f1c99c5979e9b1e213e2834 /faq.md
parente9da1193747442346691c73c82e4052e9913da97 (diff)
downloadspark-website-7aa376d72e651ff2ca1d9b5520d381fc64ea614a.tar.gz
spark-website-7aa376d72e651ff2ca1d9b5520d381fc64ea614a.tar.bz2
spark-website-7aa376d72e651ff2ca1d9b5520d381fc64ea614a.zip
Improvements to branding, some general updates, and EPS logos
Diffstat (limited to 'faq.md')
-rw-r--r--faq.md17
1 files changed, 14 insertions, 3 deletions
diff --git a/faq.md b/faq.md
index 8cd540e3b..d8de57546 100644
--- a/faq.md
+++ b/faq.md
@@ -6,16 +6,16 @@ navigation:
weight: 7
show: true
---
-<h2>Spark FAQ</h2>
+<h2>Apache Spark FAQ</h2>
-<p class="question">How does Spark relate to Hadoop?</p>
+<p class="question">How does Spark relate to Apache Hadoop?</p>
<p class="answer">
Spark is a fast and general processing engine compatible with Hadoop data. It can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning.
</p>
<p class="question">Who is using Spark in production?</p>
-<p class="answer">As of early 2015, <a href="http://java.dzone.com/articles/apache-spark-survey-typesafe-0">surveys</a> show that more than 500 organizations are using Spark in production. Some of them are listed on the <a href="https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark">Powered By page</a> and at the <a href="http://spark-summit.org">Spark Summit</a>.</p>
+<p class="answer">As of 2016, surveys show that more than 1000 organizations are using Spark in production. Some of them are listed on the <a href="https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark">Powered By page</a> and at the <a href="http://spark-summit.org">Spark Summit</a>.</p>
<p class="question">How large a cluster can Spark scale to?</p>
@@ -49,6 +49,17 @@ Spark is a fast and general processing engine compatible with Hadoop data. It ca
While Spark does use a micro-batch execution model, this does not have much impact on applications, because the batches can be as short as 0.5 seconds. In most applications of streaming big data, the analytics is done over a larger window (say 10 minutes), or the latency to get data in is higher (e.g. sensors collect readings every 10 seconds). The benefit of Spark's micro-batch model is that it enables <a href="http://people.csail.mit.edu/matei/papers/2013/sosp_spark_streaming.pdf">exactly-once semantics</a>, meaning the system can recover all intermediate state and results on failure.
+<p class="question">Where can I find high-resolution versions of the Spark logo?</p>
+
+<p class="answer">We provide versions here:
+ <a href="images/spark-logo.eps">black logo</a>,
+ <a href="images/spark-logo-reverse.eps">white logo</a>.
+ Please be aware that Spark, Apache Spark and the Spark logo are
+ trademarks of the Apache Software Foundation, and follow the Foundation's
+ <a href="https://www.apache.org/foundation/marks/">trademark policy</a>
+ in all uses of these logos.
+</p>
+
<p class="question">How can I contribute to Spark?</p>
<p class="answer">See the <a href="https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark">Contributing to Spark wiki</a> for more information.</p>