diff options
author | Yanbo Liang <ybliang8@gmail.com> | 2016-05-09 09:58:36 -0700 |
---|---|---|
committer | Davies Liu <davies.liu@gmail.com> | 2016-05-09 09:58:36 -0700 |
commit | ee3b1715620d48b8d22d086ddeef49ad7ff249d2 (patch) | |
tree | 00401e430a6d81cd2c82e6b6ba07fbbb0d87019c /docs/sparkr.md | |
parent | 652bbb1bf62722b08a062c7a2bf72019f85e179e (diff) | |
download | spark-ee3b1715620d48b8d22d086ddeef49ad7ff249d2.tar.gz spark-ee3b1715620d48b8d22d086ddeef49ad7ff249d2.tar.bz2 spark-ee3b1715620d48b8d22d086ddeef49ad7ff249d2.zip |
[MINOR] [SPARKR] Update data-manipulation.R to use native csv reader
## What changes were proposed in this pull request?
* Since Spark has supported native csv reader, it does not necessary to use the third party ```spark-csv``` in ```examples/src/main/r/data-manipulation.R```. Meanwhile, remove all ```spark-csv``` usage in SparkR.
* Running R applications through ```sparkR``` is not supported as of Spark 2.0, so we change to use ```./bin/spark-submit``` to run the example.
## How was this patch tested?
Offline test.
Author: Yanbo Liang <ybliang8@gmail.com>
Closes #13005 from yanboliang/r-df-examples.
Diffstat (limited to 'docs/sparkr.md')
-rw-r--r-- | docs/sparkr.md | 4 |
1 files changed, 2 insertions, 2 deletions
diff --git a/docs/sparkr.md b/docs/sparkr.md index 760534ae14..9b5eaa1ec7 100644 --- a/docs/sparkr.md +++ b/docs/sparkr.md @@ -115,13 +115,13 @@ head(df) SparkR supports operating on a variety of data sources through the `DataFrame` interface. This section describes the general methods for loading and saving data using Data Sources. You can check the Spark SQL programming guide for more [specific options](sql-programming-guide.html#manually-specifying-options) that are available for the built-in data sources. -The general method for creating DataFrames from data sources is `read.df`. This method takes in the `SQLContext`, the path for the file to load and the type of data source. SparkR supports reading JSON and Parquet files natively and through [Spark Packages](http://spark-packages.org/) you can find data source connectors for popular file formats like [CSV](http://spark-packages.org/package/databricks/spark-csv) and [Avro](http://spark-packages.org/package/databricks/spark-avro). These packages can either be added by +The general method for creating DataFrames from data sources is `read.df`. This method takes in the `SQLContext`, the path for the file to load and the type of data source. SparkR supports reading JSON, CSV and Parquet files natively and through [Spark Packages](http://spark-packages.org/) you can find data source connectors for popular file formats like [Avro](http://spark-packages.org/package/databricks/spark-avro). These packages can either be added by specifying `--packages` with `spark-submit` or `sparkR` commands, or if creating context through `init` you can specify the packages with the `packages` argument. <div data-lang="r" markdown="1"> {% highlight r %} -sc <- sparkR.init(sparkPackages="com.databricks:spark-csv_2.11:1.0.3") +sc <- sparkR.init(sparkPackages="com.databricks:spark-avro_2.11:2.0.1") sqlContext <- sparkRSQL.init(sc) {% endhighlight %} </div> |