aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorSandeep Singh <sandeep@techaddict.me>2016-05-03 12:38:21 +0100
committerSean Owen <sowen@cloudera.com>2016-05-03 12:38:21 +0100
commitdfd9723dd3b3ff5d47a7f04a4330bf33ffe353ac (patch)
treed82f54527ca75689e0cc4d7df5917f136b65f121
parentf10ae4b1e169495af11b8e8123c60dd96174477e (diff)
downloadspark-dfd9723dd3b3ff5d47a7f04a4330bf33ffe353ac.tar.gz
spark-dfd9723dd3b3ff5d47a7f04a4330bf33ffe353ac.tar.bz2
spark-dfd9723dd3b3ff5d47a7f04a4330bf33ffe353ac.zip
[MINOR][DOCS] Fix type Information in Quick Start and Programming Guide
Author: Sandeep Singh <sandeep@techaddict.me> Closes #12841 from techaddict/improve_docs_1.
-rw-r--r--docs/programming-guide.md2
-rw-r--r--docs/quick-start.md8
2 files changed, 5 insertions, 5 deletions
diff --git a/docs/programming-guide.md b/docs/programming-guide.md
index cf6f1d8914..d375926a91 100644
--- a/docs/programming-guide.md
+++ b/docs/programming-guide.md
@@ -328,7 +328,7 @@ Text file RDDs can be created using `SparkContext`'s `textFile` method. This met
{% highlight scala %}
scala> val distFile = sc.textFile("data.txt")
-distFile: RDD[String] = MappedRDD@1d4cee08
+distFile: org.apache.spark.rdd.RDD[String] = data.txt MapPartitionsRDD[10] at textFile at <console>:26
{% endhighlight %}
Once created, `distFile` can be acted on by dataset operations. For example, we can add up the sizes of all the lines using the `map` and `reduce` operations as follows: `distFile.map(s => s.length).reduce((a, b) => a + b)`.
diff --git a/docs/quick-start.md b/docs/quick-start.md
index d481fe0ea6..72372a6bc8 100644
--- a/docs/quick-start.md
+++ b/docs/quick-start.md
@@ -33,7 +33,7 @@ Spark's primary abstraction is a distributed collection of items called a Resili
{% highlight scala %}
scala> val textFile = sc.textFile("README.md")
-textFile: spark.RDD[String] = spark.MappedRDD@2ee9b6e3
+textFile: org.apache.spark.rdd.RDD[String] = README.md MapPartitionsRDD[1] at textFile at <console>:25
{% endhighlight %}
RDDs have _[actions](programming-guide.html#actions)_, which return values, and _[transformations](programming-guide.html#transformations)_, which return pointers to new RDDs. Let's start with a few actions:
@@ -50,7 +50,7 @@ Now let's use a transformation. We will use the [`filter`](programming-guide.htm
{% highlight scala %}
scala> val linesWithSpark = textFile.filter(line => line.contains("Spark"))
-linesWithSpark: spark.RDD[String] = spark.FilteredRDD@7dd4af09
+linesWithSpark: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[2] at filter at <console>:27
{% endhighlight %}
We can chain together transformations and actions:
@@ -123,7 +123,7 @@ One common data flow pattern is MapReduce, as popularized by Hadoop. Spark can i
{% highlight scala %}
scala> val wordCounts = textFile.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey((a, b) => a + b)
-wordCounts: spark.RDD[(String, Int)] = spark.ShuffledAggregatedRDD@71f027b8
+wordCounts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[8] at reduceByKey at <console>:28
{% endhighlight %}
Here, we combined the [`flatMap`](programming-guide.html#transformations), [`map`](programming-guide.html#transformations), and [`reduceByKey`](programming-guide.html#transformations) transformations to compute the per-word counts in the file as an RDD of (String, Int) pairs. To collect the word counts in our shell, we can use the [`collect`](programming-guide.html#actions) action:
@@ -181,7 +181,7 @@ Spark also supports pulling data sets into a cluster-wide in-memory cache. This
{% highlight scala %}
scala> linesWithSpark.cache()
-res7: spark.RDD[String] = spark.FilteredRDD@17e51082
+res7: linesWithSpark.type = MapPartitionsRDD[2] at filter at <console>:27
scala> linesWithSpark.count()
res8: Long = 19