diff options
author | felixcheung <felixcheung_m@hotmail.com> | 2015-10-30 13:51:32 -0700 |
---|---|---|
committer | Shivaram Venkataraman <shivaram@cs.berkeley.edu> | 2015-10-30 13:51:32 -0700 |
commit | bb5a2af034196620d869fc9b1a400e014e718b8c (patch) | |
tree | 55df31e52b9dea29ec7061e2e1e66db6b7199018 /docs | |
parent | 729f983e66cf65da2e8f48c463ccde2b355240c4 (diff) | |
download | spark-bb5a2af034196620d869fc9b1a400e014e718b8c.tar.gz spark-bb5a2af034196620d869fc9b1a400e014e718b8c.tar.bz2 spark-bb5a2af034196620d869fc9b1a400e014e718b8c.zip |
[SPARK-11340][SPARKR] Support setting driver properties when starting Spark from R programmatically or from RStudio
Mapping spark.driver.memory from sparkEnvir to spark-submit commandline arguments.
shivaram suggested that we possibly add other spark.driver.* properties - do we want to add all of those? I thought those could be set in SparkConf?
sun-rui
Author: felixcheung <felixcheung_m@hotmail.com>
Closes #9290 from felixcheung/rdrivermem.
Diffstat (limited to 'docs')
-rw-r--r-- | docs/sparkr.md | 28 |
1 files changed, 20 insertions, 8 deletions
diff --git a/docs/sparkr.md b/docs/sparkr.md index 7139d16b4a..497a276679 100644 --- a/docs/sparkr.md +++ b/docs/sparkr.md @@ -29,7 +29,7 @@ All of the examples on this page use sample data included in R or the Spark dist The entry point into SparkR is the `SparkContext` which connects your R program to a Spark cluster. You can create a `SparkContext` using `sparkR.init` and pass in options such as the application name , any spark packages depended on, etc. Further, to work with DataFrames we will need a `SQLContext`, -which can be created from the SparkContext. If you are working from the SparkR shell, the +which can be created from the SparkContext. If you are working from the `sparkR` shell, the `SQLContext` and `SparkContext` should already be created for you. {% highlight r %} @@ -37,17 +37,29 @@ sc <- sparkR.init() sqlContext <- sparkRSQL.init(sc) {% endhighlight %} +In the event you are creating `SparkContext` instead of using `sparkR` shell or `spark-submit`, you +could also specify certain Spark driver properties. Normally these +[Application properties](configuration.html#application-properties) and +[Runtime Environment](configuration.html#runtime-environment) cannot be set programmatically, as the +driver JVM process would have been started, in this case SparkR takes care of this for you. To set +them, pass them as you would other configuration properties in the `sparkEnvir` argument to +`sparkR.init()`. + +{% highlight r %} +sc <- sparkR.init("local[*]", "SparkR", "/home/spark", list(spark.driver.memory="2g")) +{% endhighlight %} + </div> ## Creating DataFrames With a `SQLContext`, applications can create `DataFrame`s from a local R data frame, from a [Hive table](sql-programming-guide.html#hive-tables), or from other [data sources](sql-programming-guide.html#data-sources). ### From local data frames -The simplest way to create a data frame is to convert a local R data frame into a SparkR DataFrame. Specifically we can use `createDataFrame` and pass in the local R data frame to create a SparkR DataFrame. As an example, the following creates a `DataFrame` based using the `faithful` dataset from R. +The simplest way to create a data frame is to convert a local R data frame into a SparkR DataFrame. Specifically we can use `createDataFrame` and pass in the local R data frame to create a SparkR DataFrame. As an example, the following creates a `DataFrame` based using the `faithful` dataset from R. <div data-lang="r" markdown="1"> {% highlight r %} -df <- createDataFrame(sqlContext, faithful) +df <- createDataFrame(sqlContext, faithful) # Displays the content of the DataFrame to stdout head(df) @@ -96,7 +108,7 @@ printSchema(people) </div> The data sources API can also be used to save out DataFrames into multiple file formats. For example we can save the DataFrame from the previous example -to a Parquet file using `write.df` +to a Parquet file using `write.df` <div data-lang="r" markdown="1"> {% highlight r %} @@ -139,7 +151,7 @@ Here we include some basic examples and a complete list can be found in the [API <div data-lang="r" markdown="1"> {% highlight r %} # Create the DataFrame -df <- createDataFrame(sqlContext, faithful) +df <- createDataFrame(sqlContext, faithful) # Get basic information about the DataFrame df @@ -152,7 +164,7 @@ head(select(df, df$eruptions)) ##2 1.800 ##3 3.333 -# You can also pass in column name as strings +# You can also pass in column name as strings head(select(df, "eruptions")) # Filter the DataFrame to only retain rows with wait times shorter than 50 mins @@ -166,7 +178,7 @@ head(filter(df, df$waiting < 50)) </div> -### Grouping, Aggregation +### Grouping, Aggregation SparkR data frames support a number of commonly used functions to aggregate data after grouping. For example we can compute a histogram of the `waiting` time in the `faithful` dataset as shown below @@ -194,7 +206,7 @@ head(arrange(waiting_counts, desc(waiting_counts$count))) ### Operating on Columns -SparkR also provides a number of functions that can directly applied to columns for data processing and during aggregation. The example below shows the use of basic arithmetic functions. +SparkR also provides a number of functions that can directly applied to columns for data processing and during aggregation. The example below shows the use of basic arithmetic functions. <div data-lang="r" markdown="1"> {% highlight r %} |