aboutsummaryrefslogtreecommitdiff
path: root/docs/streaming-programming-guide.md
diff options
context:
space:
mode:
authorMatei Zaharia <matei@databricks.com>2014-05-30 00:34:33 -0700
committerPatrick Wendell <pwendell@gmail.com>2014-05-30 00:34:33 -0700
commitc8bf4131bc2a2e147e977159fc90e94b85738830 (patch)
treea2f885df8fb6654bd7750bb344b97a6cb6889bf3 /docs/streaming-programming-guide.md
parenteeee978a348ec2a35cc27865cea6357f9db75b74 (diff)
downloadspark-c8bf4131bc2a2e147e977159fc90e94b85738830.tar.gz
spark-c8bf4131bc2a2e147e977159fc90e94b85738830.tar.bz2
spark-c8bf4131bc2a2e147e977159fc90e94b85738830.zip
[SPARK-1566] consolidate programming guide, and general doc updates
This is a fairly large PR to clean up and update the docs for 1.0. The major changes are: * A unified programming guide for all languages replaces language-specific ones and shows language-specific info in tabs * New programming guide sections on key-value pairs, unit testing, input formats beyond text, migrating from 0.9, and passing functions to Spark * Spark-submit guide moved to a separate page and expanded slightly * Various cleanups of the menu system, security docs, and others * Updated look of title bar to differentiate the docs from previous Spark versions You can find the updated docs at http://people.apache.org/~matei/1.0-docs/_site/ and in particular http://people.apache.org/~matei/1.0-docs/_site/programming-guide.html. Author: Matei Zaharia <matei@databricks.com> Closes #896 from mateiz/1.0-docs and squashes the following commits: 03e6853 [Matei Zaharia] Some tweaks to configuration and YARN docs 0779508 [Matei Zaharia] tweak ef671d4 [Matei Zaharia] Keep frames in JavaDoc links, and other small tweaks 1bf4112 [Matei Zaharia] Review comments 4414f88 [Matei Zaharia] tweaks d04e979 [Matei Zaharia] Fix some old links to Java guide a34ed33 [Matei Zaharia] tweak 541bb3b [Matei Zaharia] miscellaneous changes fcefdec [Matei Zaharia] Moved submitting apps to separate doc 61d72b4 [Matei Zaharia] stuff 181f217 [Matei Zaharia] migration guide, remove old language guides e11a0da [Matei Zaharia] Add more API functions 6a030a9 [Matei Zaharia] tweaks 8db0ae3 [Matei Zaharia] Added key-value pairs section 318d2c9 [Matei Zaharia] tweaks 1c81477 [Matei Zaharia] New section on basics and function syntax e38f559 [Matei Zaharia] Actually added programming guide to Git a33d6fe [Matei Zaharia] First pass at updating programming guide to support all languages, plus other tweaks throughout 3b6a876 [Matei Zaharia] More CSS tweaks 01ec8bf [Matei Zaharia] More CSS tweaks e6d252e [Matei Zaharia] Change color of doc title bar to differentiate from 0.9.0
Diffstat (limited to 'docs/streaming-programming-guide.md')
-rw-r--r--docs/streaming-programming-guide.md42
1 files changed, 20 insertions, 22 deletions
diff --git a/docs/streaming-programming-guide.md b/docs/streaming-programming-guide.md
index 972b660262..b95f818d0f 100644
--- a/docs/streaming-programming-guide.md
+++ b/docs/streaming-programming-guide.md
@@ -136,7 +136,7 @@ The complete code can be found in the Spark Streaming example
<div data-lang="java" markdown="1">
First, we create a
-[JavaStreamingContext](api/java/org/apache/spark/streaming/api/java/JavaStreamingContext.html) object,
+[JavaStreamingContext](api/java/index.html?org/apache/spark/streaming/api/java/JavaStreamingContext.html) object,
which is the main entry point for all streaming
functionality. Besides Spark's configuration, we specify that any DStream would be processed
in 1 second batches.
@@ -215,7 +215,7 @@ jssc.awaitTermination(); // Wait for the computation to terminate
{% endhighlight %}
The complete code can be found in the Spark Streaming example
-[JavaNetworkWordCount]({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/java/org/apache/spark/examples/streaming/JavaNetworkWordCount.java).
+[JavaNetworkWordCount]({{site.SPARK_GITHUB_URL}}/blob/master/examples/src/main/java/index.html?org/apache/spark/examples/streaming/JavaNetworkWordCount.java).
<br>
</div>
@@ -360,12 +360,12 @@ new JavaStreamingContext(master, appName, batchInterval, [sparkHome], [jars])
</div>
</div>
-The `master` parameter is a standard [Spark cluster URL](scala-programming-guide.html#master-urls)
+The `master` parameter is a standard [Spark cluster URL](programming-guide.html#master-urls)
and can be "local" for local testing. The `appName` is a name of your program,
which will be shown on your cluster's web UI. The `batchInterval` is the size of the batches,
as explained earlier. Finally, the last two parameters are needed to deploy your code to a cluster
if running in distributed mode, as described in the
- [Spark programming guide](scala-programming-guide.html#deploying-code-on-a-cluster).
+ [Spark programming guide](programming-guide.html#deploying-code-on-a-cluster).
Additionally, the underlying SparkContext can be accessed as
`ssc.sparkContext`.
@@ -813,10 +813,8 @@ output operators are defined:
The complete list of DStream operations is available in the API documentation. For the Scala API,
see [DStream](api/scala/index.html#org.apache.spark.streaming.dstream.DStream)
and [PairDStreamFunctions](api/scala/index.html#org.apache.spark.streaming.dstream.PairDStreamFunctions).
-For the Java API, see [JavaDStream](api/scala/index.html#org.apache.spark.streaming.api.java.dstream.DStream)
-and [JavaPairDStream](api/scala/index.html#org.apache.spark.streaming.api.java.JavaPairDStream).
-Specifically for the Java API, see [Spark's Java programming guide](java-programming-guide.html)
-for more information.
+For the Java API, see [JavaDStream](api/java/index.html?org/apache/spark/streaming/api/java/JavaDStream.html)
+and [JavaPairDStream](api/java/index.html?org/apache/spark/streaming/api/java/JavaPairDStream.html).
## Persistence
Similar to RDDs, DStreams also allow developers to persist the stream's data in memory. That is,
@@ -833,7 +831,7 @@ default persistence level is set to replicate the data to two nodes for fault-to
Note that, unlike RDDs, the default persistence level of DStreams keeps the data serialized in
memory. This is further discussed in the [Performance Tuning](#memory-tuning) section. More
information on different persistence levels can be found in
-[Spark Programming Guide](scala-programming-guide.html#rdd-persistence).
+[Spark Programming Guide](programming-guide.html#rdd-persistence).
## RDD Checkpointing
A _stateful operation_ is one which operates over multiple batches of data. This includes all
@@ -878,7 +876,7 @@ sending the data to two destinations (i.e., the earlier and upgraded application
- The existing application is shutdown gracefully (see
[`StreamingContext.stop(...)`](api/scala/index.html#org.apache.spark.streaming.StreamingContext)
-or [`JavaStreamingContext.stop(...)`](api/java/org/apache/spark/streaming/api/java/JavaStreamingContext.html)
+or [`JavaStreamingContext.stop(...)`](api/java/index.html?org/apache/spark/streaming/api/java/JavaStreamingContext.html)
for graceful shutdown options) which ensure data that have been received is completely
processed before shutdown. Then the
upgraded application can be started, which will start processing from the same point where the earlier
@@ -1313,10 +1311,10 @@ This section elaborates the steps required to migrate your existing code to 1.0.
`FlumeUtils.createStream`, etc.) now returns
[InputDStream](api/scala/index.html#org.apache.spark.streaming.dstream.InputDStream) /
[ReceiverInputDStream](api/scala/index.html#org.apache.spark.streaming.dstream.ReceiverInputDStream)
-(instead of DStream) for Scala, and [JavaInputDStream](api/java/org/apache/spark/streaming/api/java/JavaInputDStream.html) /
-[JavaPairInputDStream](api/java/org/apache/spark/streaming/api/java/JavaPairInputDStream.html) /
-[JavaReceiverInputDStream](api/java/org/apache/spark/streaming/api/java/JavaReceiverInputDStream.html) /
-[JavaPairReceiverInputDStream](api/java/org/apache/spark/streaming/api/java/JavaPairReceiverInputDStream.html)
+(instead of DStream) for Scala, and [JavaInputDStream](api/java/index.html?org/apache/spark/streaming/api/java/JavaInputDStream.html) /
+[JavaPairInputDStream](api/java/index.html?org/apache/spark/streaming/api/java/JavaPairInputDStream.html) /
+[JavaReceiverInputDStream](api/java/index.html?org/apache/spark/streaming/api/java/JavaReceiverInputDStream.html) /
+[JavaPairReceiverInputDStream](api/java/index.html?org/apache/spark/streaming/api/java/JavaPairReceiverInputDStream.html)
(instead of JavaDStream) for Java. This ensures that functionality specific to input streams can
be added to these classes in the future without breaking binary compatibility.
Note that your existing Spark Streaming applications should not require any change
@@ -1367,14 +1365,14 @@ package and renamed for better clarity.
[ZeroMQUtils](api/scala/index.html#org.apache.spark.streaming.zeromq.ZeroMQUtils$), and
[MQTTUtils](api/scala/index.html#org.apache.spark.streaming.mqtt.MQTTUtils$)
- Java docs
- * [JavaStreamingContext](api/java/org/apache/spark/streaming/api/java/JavaStreamingContext.html),
- [JavaDStream](api/java/org/apache/spark/streaming/api/java/JavaDStream.html) and
- [PairJavaDStream](api/java/org/apache/spark/streaming/api/java/PairJavaDStream.html)
- * [KafkaUtils](api/java/org/apache/spark/streaming/kafka/KafkaUtils.html),
- [FlumeUtils](api/java/org/apache/spark/streaming/flume/FlumeUtils.html),
- [TwitterUtils](api/java/org/apache/spark/streaming/twitter/TwitterUtils.html),
- [ZeroMQUtils](api/java/org/apache/spark/streaming/zeromq/ZeroMQUtils.html), and
- [MQTTUtils](api/java/org/apache/spark/streaming/mqtt/MQTTUtils.html)
+ * [JavaStreamingContext](api/java/index.html?org/apache/spark/streaming/api/java/JavaStreamingContext.html),
+ [JavaDStream](api/java/index.html?org/apache/spark/streaming/api/java/JavaDStream.html) and
+ [PairJavaDStream](api/java/index.html?org/apache/spark/streaming/api/java/PairJavaDStream.html)
+ * [KafkaUtils](api/java/index.html?org/apache/spark/streaming/kafka/KafkaUtils.html),
+ [FlumeUtils](api/java/index.html?org/apache/spark/streaming/flume/FlumeUtils.html),
+ [TwitterUtils](api/java/index.html?org/apache/spark/streaming/twitter/TwitterUtils.html),
+ [ZeroMQUtils](api/java/index.html?org/apache/spark/streaming/zeromq/ZeroMQUtils.html), and
+ [MQTTUtils](api/java/index.html?org/apache/spark/streaming/mqtt/MQTTUtils.html)
* More examples in [Scala]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/scala/org/apache/spark/examples/streaming)
and [Java]({{site.SPARK_GITHUB_URL}}/tree/master/examples/src/main/java/org/apache/spark/examples/streaming)