diff options
author | Felix Cheung <felixcheung_m@hotmail.com> | 2016-06-17 21:36:01 -0700 |
---|---|---|
committer | Shivaram Venkataraman <shivaram@cs.berkeley.edu> | 2016-06-17 21:36:01 -0700 |
commit | 8c198e246d64b5779dc3a2625d06ec958553a20b (patch) | |
tree | 8e882c1a467cb454863b08c74124a36d30120314 /licenses/LICENSE-slf4j.txt | |
parent | edb23f9e47eecfe60992dde0e037ec1985c77e1d (diff) | |
download | spark-8c198e246d64b5779dc3a2625d06ec958553a20b.tar.gz spark-8c198e246d64b5779dc3a2625d06ec958553a20b.tar.bz2 spark-8c198e246d64b5779dc3a2625d06ec958553a20b.zip |
[SPARK-15159][SPARKR] SparkR SparkSession API
## What changes were proposed in this pull request?
This PR introduces the new SparkSession API for SparkR.
`sparkR.session.getOrCreate()` and `sparkR.session.stop()`
"getOrCreate" is a bit unusual in R but it's important to name this clearly.
SparkR implementation should
- SparkSession is the main entrypoint (vs SparkContext; due to limited functionality supported with SparkContext in SparkR)
- SparkSession replaces SQLContext and HiveContext (both a wrapper around SparkSession, and because of API changes, supporting all 3 would be a lot more work)
- Changes to SparkSession is mostly transparent to users due to SPARK-10903
- Full backward compatibility is expected - users should be able to initialize everything just in Spark 1.6.1 (`sparkR.init()`), but with deprecation warning
- Mostly cosmetic changes to parameter list - users should be able to move to `sparkR.session.getOrCreate()` easily
- An advanced syntax with named parameters (aka varargs aka "...") is supported; that should be closer to the Builder syntax that is in Scala/Python (which unfortunately does not work in R because it will look like this: `enableHiveSupport(config(config(master(appName(builder(), "foo"), "local"), "first", "value"), "next, "value"))`
- Updating config on an existing SparkSession is supported, the behavior is the same as Python, in which config is applied to both SparkContext and SparkSession
- Some SparkSession changes are not matched in SparkR, mostly because it would be breaking API change: `catalog` object, `createOrReplaceTempView`
- Other SQLContext workarounds are replicated in SparkR, eg. `tables`, `tableNames`
- `sparkR` shell is updated to use the SparkSession entrypoint (`sqlContext` is removed, just like with Scale/Python)
- All tests are updated to use the SparkSession entrypoint
- A bug in `read.jdbc` is fixed
TODO
- [x] Add more tests
- [ ] Separate PR - update all roxygen2 doc coding example
- [ ] Separate PR - update SparkR programming guide
## How was this patch tested?
unit tests, manual tests
shivaram sun-rui rxin
Author: Felix Cheung <felixcheung_m@hotmail.com>
Author: felixcheung <felixcheung_m@hotmail.com>
Closes #13635 from felixcheung/rsparksession.
Diffstat (limited to 'licenses/LICENSE-slf4j.txt')
0 files changed, 0 insertions, 0 deletions