diff options
author | Sandeep Singh <sandeep@techaddict.me> | 2016-05-03 12:38:21 +0100 |
---|---|---|
committer | Sean Owen <sowen@cloudera.com> | 2016-05-03 12:38:21 +0100 |
commit | dfd9723dd3b3ff5d47a7f04a4330bf33ffe353ac (patch) | |
tree | d82f54527ca75689e0cc4d7df5917f136b65f121 /docs | |
parent | f10ae4b1e169495af11b8e8123c60dd96174477e (diff) | |
download | spark-dfd9723dd3b3ff5d47a7f04a4330bf33ffe353ac.tar.gz spark-dfd9723dd3b3ff5d47a7f04a4330bf33ffe353ac.tar.bz2 spark-dfd9723dd3b3ff5d47a7f04a4330bf33ffe353ac.zip |
[MINOR][DOCS] Fix type Information in Quick Start and Programming Guide
Author: Sandeep Singh <sandeep@techaddict.me>
Closes #12841 from techaddict/improve_docs_1.
Diffstat (limited to 'docs')
-rw-r--r-- | docs/programming-guide.md | 2 | ||||
-rw-r--r-- | docs/quick-start.md | 8 |
2 files changed, 5 insertions, 5 deletions
diff --git a/docs/programming-guide.md b/docs/programming-guide.md index cf6f1d8914..d375926a91 100644 --- a/docs/programming-guide.md +++ b/docs/programming-guide.md @@ -328,7 +328,7 @@ Text file RDDs can be created using `SparkContext`'s `textFile` method. This met {% highlight scala %} scala> val distFile = sc.textFile("data.txt") -distFile: RDD[String] = MappedRDD@1d4cee08 +distFile: org.apache.spark.rdd.RDD[String] = data.txt MapPartitionsRDD[10] at textFile at <console>:26 {% endhighlight %} Once created, `distFile` can be acted on by dataset operations. For example, we can add up the sizes of all the lines using the `map` and `reduce` operations as follows: `distFile.map(s => s.length).reduce((a, b) => a + b)`. diff --git a/docs/quick-start.md b/docs/quick-start.md index d481fe0ea6..72372a6bc8 100644 --- a/docs/quick-start.md +++ b/docs/quick-start.md @@ -33,7 +33,7 @@ Spark's primary abstraction is a distributed collection of items called a Resili {% highlight scala %} scala> val textFile = sc.textFile("README.md") -textFile: spark.RDD[String] = spark.MappedRDD@2ee9b6e3 +textFile: org.apache.spark.rdd.RDD[String] = README.md MapPartitionsRDD[1] at textFile at <console>:25 {% endhighlight %} RDDs have _[actions](programming-guide.html#actions)_, which return values, and _[transformations](programming-guide.html#transformations)_, which return pointers to new RDDs. Let's start with a few actions: @@ -50,7 +50,7 @@ Now let's use a transformation. We will use the [`filter`](programming-guide.htm {% highlight scala %} scala> val linesWithSpark = textFile.filter(line => line.contains("Spark")) -linesWithSpark: spark.RDD[String] = spark.FilteredRDD@7dd4af09 +linesWithSpark: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[2] at filter at <console>:27 {% endhighlight %} We can chain together transformations and actions: @@ -123,7 +123,7 @@ One common data flow pattern is MapReduce, as popularized by Hadoop. Spark can i {% highlight scala %} scala> val wordCounts = textFile.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey((a, b) => a + b) -wordCounts: spark.RDD[(String, Int)] = spark.ShuffledAggregatedRDD@71f027b8 +wordCounts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[8] at reduceByKey at <console>:28 {% endhighlight %} Here, we combined the [`flatMap`](programming-guide.html#transformations), [`map`](programming-guide.html#transformations), and [`reduceByKey`](programming-guide.html#transformations) transformations to compute the per-word counts in the file as an RDD of (String, Int) pairs. To collect the word counts in our shell, we can use the [`collect`](programming-guide.html#actions) action: @@ -181,7 +181,7 @@ Spark also supports pulling data sets into a cluster-wide in-memory cache. This {% highlight scala %} scala> linesWithSpark.cache() -res7: spark.RDD[String] = spark.FilteredRDD@17e51082 +res7: linesWithSpark.type = MapPartitionsRDD[2] at filter at <console>:27 scala> linesWithSpark.count() res8: Long = 19 |