diff options
author | Dongjoon Hyun <dongjoon@apache.org> | 2016-04-24 22:10:27 -0700 |
---|---|---|
committer | Shivaram Venkataraman <shivaram@cs.berkeley.edu> | 2016-04-24 22:10:27 -0700 |
commit | 6ab4d9e0c76b69b4d6d5f39037a77bdfb042be19 (patch) | |
tree | 494b601ba783d7b025b805504bde8f3f92b7667b /docs/sparkr.md | |
parent | 35319d326488b3bf9235dfcf9ac4533ce846f21f (diff) | |
download | spark-6ab4d9e0c76b69b4d6d5f39037a77bdfb042be19.tar.gz spark-6ab4d9e0c76b69b4d6d5f39037a77bdfb042be19.tar.bz2 spark-6ab4d9e0c76b69b4d6d5f39037a77bdfb042be19.zip |
[SPARK-14883][DOCS] Fix wrong R examples and make them up-to-date
## What changes were proposed in this pull request?
This issue aims to fix some errors in R examples and make them up-to-date in docs and example modules.
- Remove the wrong usage of `map`. We need to use `lapply` in `sparkR` if needed. However, `lapply` is private so far. The corrected example will be added later.
- Fix the wrong example in Section `Generic Load/Save Functions` of `docs/sql-programming-guide.md` for consistency
- Fix datatypes in `sparkr.md`.
- Update a data result in `sparkr.md`.
- Replace deprecated functions to remove warnings: jsonFile -> read.json, parquetFile -> read.parquet
- Use up-to-date R-like functions: loadDF -> read.df, saveDF -> write.df, saveAsParquetFile -> write.parquet
- Replace `SparkR DataFrame` with `SparkDataFrame` in `dataframe.R` and `data-manipulation.R`.
- Other minor syntax fixes and a typo.
## How was this patch tested?
Manual.
Author: Dongjoon Hyun <dongjoon@apache.org>
Closes #12649 from dongjoon-hyun/SPARK-14883.
Diffstat (limited to 'docs/sparkr.md')
-rw-r--r-- | docs/sparkr.md | 11 |
1 files changed, 5 insertions, 6 deletions
diff --git a/docs/sparkr.md b/docs/sparkr.md index a0b4f93776..760534ae14 100644 --- a/docs/sparkr.md +++ b/docs/sparkr.md @@ -141,7 +141,7 @@ head(people) # SparkR automatically infers the schema from the JSON file printSchema(people) # root -# |-- age: integer (nullable = true) +# |-- age: long (nullable = true) # |-- name: string (nullable = true) {% endhighlight %} @@ -195,7 +195,7 @@ df <- createDataFrame(sqlContext, faithful) # Get basic information about the DataFrame df -## DataFrame[eruptions:double, waiting:double] +## SparkDataFrame[eruptions:double, waiting:double] # Select only the "eruptions" column head(select(df, df$eruptions)) @@ -228,14 +228,13 @@ SparkR data frames support a number of commonly used functions to aggregate data # We use the `n` operator to count the number of times each waiting time appears head(summarize(groupBy(df, df$waiting), count = n(df$waiting))) ## waiting count -##1 81 13 -##2 60 6 -##3 68 1 +##1 70 4 +##2 67 1 +##3 69 2 # We can also sort the output from the aggregation to get the most common waiting times waiting_counts <- summarize(groupBy(df, df$waiting), count = n(df$waiting)) head(arrange(waiting_counts, desc(waiting_counts$count))) - ## waiting count ##1 78 15 ##2 83 14 |