From 3de586b4b4fde7aa5f20cc6d116e03615987f11a Mon Sep 17 00:00:00 2001 From: Matei Alexandru Zaharia Date: Sun, 31 May 2015 19:04:53 +0000 Subject: Some updates to FAQ on streaming --- site/faq.html | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) (limited to 'site/faq.html') diff --git a/site/faq.html b/site/faq.html index 3f73727ac..979a55539 100644 --- a/site/faq.html +++ b/site/faq.html @@ -196,9 +196,6 @@ Spark is a fast and general processing engine compatible with Hadoop data. It ca

How can I access data in S3?

Use the s3n:// URI scheme (s3n://bucket/path). You will also need to set your Amazon security credentials, either by setting the environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY before your program runs, or by setting fs.s3.awsAccessKeyId and fs.s3.awsSecretAccessKey in SparkContext.hadoopConfiguration.

-

Which languages does Spark support?

-

Spark supports Scala, Java and Python.

-

Does Spark require modified versions of Scala or Python?

No. Spark requires no changes to Scala or compiler plugins. The Python API uses the standard CPython implementation, and can call into existing C libraries for Python such as NumPy.

@@ -208,9 +205,9 @@ Spark is a fast and general processing engine compatible with Hadoop data. It ca

In addition, Spark also has Java and Python APIs.

-

What license is Spark under?

+

I understand Spark Streaming uses micro-batching. Does this increase latency?

-

Starting in version 0.8, Spark is under the Apache 2.0 license. Previous versions used the BSD license.

+While Spark does use a micro-batch execution model, this does not have much impact on applications, because the batches can be as short as 0.5 seconds. In most applications of streaming big data, the analytics is done over a larger window (say 10 minutes), or the latency to get data in is higher (e.g. sensors collect readings every 10 seconds). The benefit of Spark's micro-batch model is that it enables exactly-once semantics, meaning the system can recover all intermediate state and results on failure.

How can I contribute to Spark?

-- cgit v1.2.3