From ee3b1715620d48b8d22d086ddeef49ad7ff249d2 Mon Sep 17 00:00:00 2001 From: Yanbo Liang Date: Mon, 9 May 2016 09:58:36 -0700 Subject: [MINOR] [SPARKR] Update data-manipulation.R to use native csv reader ## What changes were proposed in this pull request? * Since Spark has supported native csv reader, it does not necessary to use the third party ```spark-csv``` in ```examples/src/main/r/data-manipulation.R```. Meanwhile, remove all ```spark-csv``` usage in SparkR. * Running R applications through ```sparkR``` is not supported as of Spark 2.0, so we change to use ```./bin/spark-submit``` to run the example. ## How was this patch tested? Offline test. Author: Yanbo Liang Closes #13005 from yanboliang/r-df-examples. --- docs/sparkr.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'docs') diff --git a/docs/sparkr.md b/docs/sparkr.md index 760534ae14..9b5eaa1ec7 100644 --- a/docs/sparkr.md +++ b/docs/sparkr.md @@ -115,13 +115,13 @@ head(df) SparkR supports operating on a variety of data sources through the `DataFrame` interface. This section describes the general methods for loading and saving data using Data Sources. You can check the Spark SQL programming guide for more [specific options](sql-programming-guide.html#manually-specifying-options) that are available for the built-in data sources. -The general method for creating DataFrames from data sources is `read.df`. This method takes in the `SQLContext`, the path for the file to load and the type of data source. SparkR supports reading JSON and Parquet files natively and through [Spark Packages](http://spark-packages.org/) you can find data source connectors for popular file formats like [CSV](http://spark-packages.org/package/databricks/spark-csv) and [Avro](http://spark-packages.org/package/databricks/spark-avro). These packages can either be added by +The general method for creating DataFrames from data sources is `read.df`. This method takes in the `SQLContext`, the path for the file to load and the type of data source. SparkR supports reading JSON, CSV and Parquet files natively and through [Spark Packages](http://spark-packages.org/) you can find data source connectors for popular file formats like [Avro](http://spark-packages.org/package/databricks/spark-avro). These packages can either be added by specifying `--packages` with `spark-submit` or `sparkR` commands, or if creating context through `init` you can specify the packages with the `packages` argument.
{% highlight r %} -sc <- sparkR.init(sparkPackages="com.databricks:spark-csv_2.11:1.0.3") +sc <- sparkR.init(sparkPackages="com.databricks:spark-avro_2.11:2.0.1") sqlContext <- sparkRSQL.init(sc) {% endhighlight %}
-- cgit v1.2.3