[SPARKR][DOCS] fix broken url in doc

## What changes were proposed in this pull request? Fix broken url, also, sparkR.session.stop doc page should have it in the header, instead of saying "sparkR.stop" ![image](https://cloud.githubusercontent.com/assets/8969467/17080129/26d41308-50d9-11e6-8967-79d6c920313f.png) Data type section is in the middle of a list of gapply/gapplyCollect subsections: ![image](https://cloud.githubusercontent.com/assets/8969467/17080122/f992d00a-50d8-11e6-8f2c-fd5786213920.png) ## How was this patch tested? manual test Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #14329 from felixcheung/rdoclinkfix.
author: Felix Cheung <felixcheung_m@hotmail.com> 2016-07-25 11:25:41 -0700
committer: Shivaram Venkataraman <shivaram@cs.berkeley.edu> 2016-07-25 11:25:41 -0700
commit: b73defdd790cb823a4f9958ca89cec06fd198051 (patch)
tree: 32ed95b25df49c04278cb86d9ccafaee2e3fabd4 /docs/sparkr.md
parent: 7ea6d282b925819ddb3874a67b3c9da8cc41f131 (diff)
download: spark-b73defdd790cb823a4f9958ca89cec06fd198051.tar.gz
spark-b73defdd790cb823a4f9958ca89cec06fd198051.tar.bz2
spark-b73defdd790cb823a4f9958ca89cec06fd198051.zip
1 files changed, 53 insertions, 54 deletions
diff --git a/docs/sparkr.md b/docs/sparkr.md
index dfa5278ef8..4bbc362c52 100644
--- a/docs/sparkr.md
+++ b/docs/sparkr.md
@@ -322,8 +322,59 @@ head(ldf, 3)
 Apply a function to each group of a `SparkDataFrame`. The function is to be applied to each group of the `SparkDataFrame` and should have only two parameters: grouping key and R `data.frame` corresponding to
 that key. The groups are chosen from `SparkDataFrame`s column(s).
 The output of function should be a `data.frame`. Schema specifies the row format of the resulting
-`SparkDataFrame`. It must represent R function's output schema on the basis of Spark data types. The column names of the returned `data.frame` are set by user. Below is the data type mapping between R
-and Spark.
+`SparkDataFrame`. It must represent R function's output schema on the basis of Spark [data types](#data-type-mapping-between-r-and-spark). The column names of the returned `data.frame` are set by user.
+
+<div data-lang="r"  markdown="1">
+{% highlight r %}
+
+# Determine six waiting times with the largest eruption time in minutes.
+schema <- structType(structField("waiting", "double"), structField("max_eruption", "double"))
+result <- gapply(
+    df,
+    "waiting",
+    function(key, x) {
+        y <- data.frame(key, max(x$eruptions))
+    },
+    schema)
+head(collect(arrange(result, "max_eruption", decreasing = TRUE)))
+
+##    waiting   max_eruption
+##1      64       5.100
+##2      69       5.067
+##3      71       5.033
+##4      87       5.000
+##5      63       4.933
+##6      89       4.900
+{% endhighlight %}
+</div>
+
+##### gapplyCollect
+Like `gapply`, applies a function to each partition of a `SparkDataFrame` and collect the result back to R data.frame. The output of the function should be a `data.frame`. But, the schema is not required to be passed. Note that `gapplyCollect` can fail if the output of UDF run on all the partition cannot be pulled to the driver and fit in driver memory.
+
+<div data-lang="r"  markdown="1">
+{% highlight r %}
+
+# Determine six waiting times with the largest eruption time in minutes.
+result <- gapplyCollect(
+    df,
+    "waiting",
+    function(key, x) {
+        y <- data.frame(key, max(x$eruptions))
+        colnames(y) <- c("waiting", "max_eruption")
+        y
+    })
+head(result[order(result$max_eruption, decreasing = TRUE), ])
+
+##    waiting   max_eruption
+##1      64       5.100
+##2      69       5.067
+##3      71       5.033
+##4      87       5.000
+##5      63       4.933
+##6      89       4.900
+
+{% endhighlight %}
+</div>
 
 #### Data type mapping between R and Spark
 <table class="table">
@@ -394,58 +445,6 @@ and Spark.
 </tr>
 </table>
 
-<div data-lang="r"  markdown="1">
-{% highlight r %}
-
-# Determine six waiting times with the largest eruption time in minutes.
-schema <- structType(structField("waiting", "double"), structField("max_eruption", "double"))
-result <- gapply(
-    df,
-    "waiting",
-    function(key, x) {
-        y <- data.frame(key, max(x$eruptions))
-    },
-    schema)
-head(collect(arrange(result, "max_eruption", decreasing = TRUE)))
-
-##    waiting   max_eruption
-##1      64       5.100
-##2      69       5.067
-##3      71       5.033
-##4      87       5.000
-##5      63       4.933
-##6      89       4.900
-{% endhighlight %}
-</div>
-
-##### gapplyCollect
-Like `gapply`, applies a function to each partition of a `SparkDataFrame` and collect the result back to R data.frame. The output of the function should be a `data.frame`. But, the schema is not required to be passed. Note that `gapplyCollect` can fail if the output of UDF run on all the partition cannot be pulled to the driver and fit in driver memory.
-
-<div data-lang="r"  markdown="1">
-{% highlight r %}
-
-# Determine six waiting times with the largest eruption time in minutes.
-result <- gapplyCollect(
-    df,
-    "waiting",
-    function(key, x) {
-        y <- data.frame(key, max(x$eruptions))
-        colnames(y) <- c("waiting", "max_eruption")
-        y
-    })
-head(result[order(result$max_eruption, decreasing = TRUE), ])
-
-##    waiting   max_eruption
-##1      64       5.100
-##2      69       5.067
-##3      71       5.033
-##4      87       5.000
-##5      63       4.933
-##6      89       4.900
-
-{% endhighlight %}
-</div>
-
 #### Run local R functions distributed using `spark.lapply`
 
 ##### spark.lapply
author	Felix Cheung <felixcheung_m@hotmail.com>	2016-07-25 11:25:41 -0700
committer	Shivaram Venkataraman <shivaram@cs.berkeley.edu>	2016-07-25 11:25:41 -0700
commit	b73defdd790cb823a4f9958ca89cec06fd198051 (patch)
tree	32ed95b25df49c04278cb86d9ccafaee2e3fabd4 /docs/sparkr.md
parent	7ea6d282b925819ddb3874a67b3c9da8cc41f131 (diff)
download	spark-b73defdd790cb823a4f9958ca89cec06fd198051.tar.gz spark-b73defdd790cb823a4f9958ca89cec06fd198051.tar.bz2 spark-b73defdd790cb823a4f9958ca89cec06fd198051.zip