aboutsummaryrefslogtreecommitdiff
path: root/examples/src
diff options
context:
space:
mode:
authorYanbo Liang <ybliang8@gmail.com>2016-05-09 09:58:36 -0700
committerDavies Liu <davies.liu@gmail.com>2016-05-09 09:58:36 -0700
commitee3b1715620d48b8d22d086ddeef49ad7ff249d2 (patch)
tree00401e430a6d81cd2c82e6b6ba07fbbb0d87019c /examples/src
parent652bbb1bf62722b08a062c7a2bf72019f85e179e (diff)
downloadspark-ee3b1715620d48b8d22d086ddeef49ad7ff249d2.tar.gz
spark-ee3b1715620d48b8d22d086ddeef49ad7ff249d2.tar.bz2
spark-ee3b1715620d48b8d22d086ddeef49ad7ff249d2.zip
[MINOR] [SPARKR] Update data-manipulation.R to use native csv reader
## What changes were proposed in this pull request? * Since Spark has supported native csv reader, it does not necessary to use the third party ```spark-csv``` in ```examples/src/main/r/data-manipulation.R```. Meanwhile, remove all ```spark-csv``` usage in SparkR. * Running R applications through ```sparkR``` is not supported as of Spark 2.0, so we change to use ```./bin/spark-submit``` to run the example. ## How was this patch tested? Offline test. Author: Yanbo Liang <ybliang8@gmail.com> Closes #13005 from yanboliang/r-df-examples.
Diffstat (limited to 'examples/src')
-rw-r--r--examples/src/main/r/data-manipulation.R7
1 files changed, 3 insertions, 4 deletions
diff --git a/examples/src/main/r/data-manipulation.R b/examples/src/main/r/data-manipulation.R
index 594bf49d60..58a30135aa 100644
--- a/examples/src/main/r/data-manipulation.R
+++ b/examples/src/main/r/data-manipulation.R
@@ -20,8 +20,7 @@
# The data set is made up of 227,496 rows x 14 columns.
# To run this example use
-# ./bin/sparkR --packages com.databricks:spark-csv_2.10:1.0.3
-# examples/src/main/r/data-manipulation.R <path_to_csv>
+# ./bin/spark-submit examples/src/main/r/data-manipulation.R <path_to_csv>
# Load SparkR library into your R session
library(SparkR)
@@ -29,7 +28,7 @@ library(SparkR)
args <- commandArgs(trailing = TRUE)
if (length(args) != 1) {
- print("Usage: data-manipulation.R <path-to-flights.csv")
+ print("Usage: data-manipulation.R <path-to-flights.csv>")
print("The data can be downloaded from: http://s3-us-west-2.amazonaws.com/sparkr-data/flights.csv")
q("no")
}
@@ -53,7 +52,7 @@ SFO_df <- flights_df[flights_df$dest == "SFO", ]
SFO_DF <- createDataFrame(sqlContext, SFO_df)
# Directly create a SparkDataFrame from the source data
-flightsDF <- read.df(sqlContext, flightsCsvPath, source = "com.databricks.spark.csv", header = "true")
+flightsDF <- read.df(sqlContext, flightsCsvPath, source = "csv", header = "true")
# Print the schema of this SparkDataFrame
printSchema(flightsDF)