From 6c1ecb191bc086290e33d56b6a5706d962e84a3a Mon Sep 17 00:00:00 2001 From: Shivansh Date: Sun, 7 Aug 2016 09:30:18 +0100 Subject: [SPARK-16911] Fix the links in the programming guide ## What changes were proposed in this pull request? Fix the broken links in the programming guide of the Graphx Migration and understanding closures ## How was this patch tested? By running the test cases and checking the links. Author: Shivansh Closes #14503 from shiv4nsh/SPARK-16911. --- docs/graphx-programming-guide.md | 17 -------------- docs/programming-guide.md | 45 +------------------------------------ docs/streaming-programming-guide.md | 45 ------------------------------------- 3 files changed, 1 insertion(+), 106 deletions(-) diff --git a/docs/graphx-programming-guide.md b/docs/graphx-programming-guide.md index 2e9966c0a2..bf4b968eb8 100644 --- a/docs/graphx-programming-guide.md +++ b/docs/graphx-programming-guide.md @@ -67,23 +67,6 @@ operators (e.g., [subgraph](#structural_operators), [joinVertices](#join_operato [aggregateMessages](#aggregateMessages)) as well as an optimized variant of the [Pregel](#pregel) API. In addition, GraphX includes a growing collection of graph [algorithms](#graph_algorithms) and [builders](#graph_builders) to simplify graph analytics tasks. - -## Migrating from Spark 1.1 - -GraphX in Spark 1.2 contains a few user facing API changes: - -1. To improve performance we have introduced a new version of -[`mapReduceTriplets`][Graph.mapReduceTriplets] called -[`aggregateMessages`][Graph.aggregateMessages] which takes the messages previously returned from -[`mapReduceTriplets`][Graph.mapReduceTriplets] through a callback ([`EdgeContext`][EdgeContext]) -rather than by return value. -We are deprecating [`mapReduceTriplets`][Graph.mapReduceTriplets] and encourage users to consult -the [transition guide](#mrTripletsTransition). - -2. In Spark 1.0 and 1.1, the type signature of [`EdgeRDD`][EdgeRDD] switched from -`EdgeRDD[ED]` to `EdgeRDD[ED, VD]` to enable some caching optimizations. We have since discovered -a more elegant solution and have restored the type signature to the more natural `EdgeRDD[ED]` type. - # Getting Started To get started you first need to import Spark and GraphX into your project, as follows: diff --git a/docs/programming-guide.md b/docs/programming-guide.md index 5fcd4d3647..f82832905e 100644 --- a/docs/programming-guide.md +++ b/docs/programming-guide.md @@ -1097,7 +1097,7 @@ for details. foreach(func) Run a function func on each element of the dataset. This is usually done for side effects such as updating an Accumulator or interacting with external storage systems. -
Note: modifying variables other than Accumulators outside of the foreach() may result in undefined behavior. See Understanding closures for more details. +
Note: modifying variables other than Accumulators outside of the foreach() may result in undefined behavior. See Understanding closures for more details. @@ -1544,49 +1544,6 @@ and then call `SparkContext.stop()` to tear it down. Make sure you stop the context within a `finally` block or the test framework's `tearDown` method, as Spark does not support two contexts running concurrently in the same program. -# Migrating from pre-1.0 Versions of Spark - -
- -
- -Spark 1.0 freezes the API of Spark Core for the 1.X series, in that any API available today that is -not marked "experimental" or "developer API" will be supported in future versions. -The only change for Scala users is that the grouping operations, e.g. `groupByKey`, `cogroup` and `join`, -have changed from returning `(Key, Seq[Value])` pairs to `(Key, Iterable[Value])`. - -
- -
- -Spark 1.0 freezes the API of Spark Core for the 1.X series, in that any API available today that is -not marked "experimental" or "developer API" will be supported in future versions. -Several changes were made to the Java API: - -* The Function classes in `org.apache.spark.api.java.function` became interfaces in 1.0, meaning that old - code that `extends Function` should `implement Function` instead. -* New variants of the `map` transformations, like `mapToPair` and `mapToDouble`, were added to create RDDs - of special data types. -* Grouping operations like `groupByKey`, `cogroup` and `join` have changed from returning - `(Key, List)` pairs to `(Key, Iterable)`. - -
- -
- -Spark 1.0 freezes the API of Spark Core for the 1.X series, in that any API available today that is -not marked "experimental" or "developer API" will be supported in future versions. -The only change for Python users is that the grouping operations, e.g. `groupByKey`, `cogroup` and `join`, -have changed from returning (key, list of values) pairs to (key, iterable of values). - -
- -
- -Migration guides are also available for [Spark Streaming](streaming-programming-guide.html#migration-guide-from-091-or-below-to-1x), -[MLlib](ml-guide.html#migration-guide) and [GraphX](graphx-programming-guide.html#migrating-from-spark-091). - - # Where to Go from Here You can see some [example Spark programs](http://spark.apache.org/examples.html) on the Spark website. diff --git a/docs/streaming-programming-guide.md b/docs/streaming-programming-guide.md index 902df6ada8..3d40b2c313 100644 --- a/docs/streaming-programming-guide.md +++ b/docs/streaming-programming-guide.md @@ -2378,51 +2378,6 @@ additional effort may be necessary to achieve exactly-once semantics. There are *************************************************************************************************** *************************************************************************************************** -# Migration Guide from 0.9.1 or below to 1.x -Between Spark 0.9.1 and Spark 1.0, there were a few API changes made to ensure future API stability. -This section elaborates the steps required to migrate your existing code to 1.0. - -**Input DStreams**: All operations that create an input stream (e.g., `StreamingContext.socketStream`, `FlumeUtils.createStream`, etc.) now returns -[InputDStream](api/scala/index.html#org.apache.spark.streaming.dstream.InputDStream) / -[ReceiverInputDStream](api/scala/index.html#org.apache.spark.streaming.dstream.ReceiverInputDStream) -(instead of DStream) for Scala, and [JavaInputDStream](api/java/index.html?org/apache/spark/streaming/api/java/JavaInputDStream.html) / -[JavaPairInputDStream](api/java/index.html?org/apache/spark/streaming/api/java/JavaPairInputDStream.html) / -[JavaReceiverInputDStream](api/java/index.html?org/apache/spark/streaming/api/java/JavaReceiverInputDStream.html) / -[JavaPairReceiverInputDStream](api/java/index.html?org/apache/spark/streaming/api/java/JavaPairReceiverInputDStream.html) -(instead of JavaDStream) for Java. This ensures that functionality specific to input streams can -be added to these classes in the future without breaking binary compatibility. -Note that your existing Spark Streaming applications should not require any change -(as these new classes are subclasses of DStream/JavaDStream) but may require recompilation with Spark 1.0. - -**Custom Network Receivers**: Since the release to Spark Streaming, custom network receivers could be defined -in Scala using the class NetworkReceiver. However, the API was limited in terms of error handling -and reporting, and could not be used from Java. Starting Spark 1.0, this class has been -replaced by [Receiver](api/scala/index.html#org.apache.spark.streaming.receiver.Receiver) which has -the following advantages. - -* Methods like `stop` and `restart` have been added to for better control of the lifecycle of a receiver. See -the [custom receiver guide](streaming-custom-receivers.html) for more details. -* Custom receivers can be implemented using both Scala and Java. - -To migrate your existing custom receivers from the earlier NetworkReceiver to the new Receiver, you have -to do the following. - -* Make your custom receiver class extend -[`org.apache.spark.streaming.receiver.Receiver`](api/scala/index.html#org.apache.spark.streaming.receiver.Receiver) -instead of `org.apache.spark.streaming.dstream.NetworkReceiver`. -* Earlier, a BlockGenerator object had to be created by the custom receiver, to which received data was -added for being stored in Spark. It had to be explicitly started and stopped from `onStart()` and `onStop()` -methods. The new Receiver class makes this unnecessary as it adds a set of methods named `store()` -that can be called to store the data in Spark. So, to migrate your custom network receiver, remove any -BlockGenerator object (does not exist any more in Spark 1.0 anyway), and use `store(...)` methods on -received data. - -**Actor-based Receivers**: The Actor-based Receiver APIs have been moved to [DStream Akka](https://github.com/spark-packages/dstream-akka). -Please refer to the project for more details. - -*************************************************************************************************** -*************************************************************************************************** - # Where to Go from Here * Additional guides - [Kafka Integration Guide](streaming-kafka-integration.html) -- cgit v1.2.3