diff options
author | Shivaram Venkataraman <shivaram@cs.berkeley.edu> | 2015-05-08 18:29:57 -0700 |
---|---|---|
committer | Shivaram Venkataraman <shivaram@cs.berkeley.edu> | 2015-05-08 18:29:57 -0700 |
commit | 0a901dd3a1eb3fd459d45b771ce4ad2cfef2a944 (patch) | |
tree | c7b2479550c1ebadca8f8b5f4caf6f63db0f57e1 /external/twitter | |
parent | b6c797b08cbd08d7aab59ad0106af0f5f41ef186 (diff) | |
download | spark-0a901dd3a1eb3fd459d45b771ce4ad2cfef2a944.tar.gz spark-0a901dd3a1eb3fd459d45b771ce4ad2cfef2a944.tar.bz2 spark-0a901dd3a1eb3fd459d45b771ce4ad2cfef2a944.zip |
[SPARK-7231] [SPARKR] Changes to make SparkR DataFrame dplyr friendly.
Changes include
1. Rename sortDF to arrange
2. Add new aliases `group_by` and `sample_frac`, `summarize`
3. Add more user friendly column addition (mutate), rename
4. Support mean as an alias for avg in Scala and also support n_distinct, n as in dplyr
Using these changes we can pretty much run the examples as described in http://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html with the same syntax
The only thing missing in SparkR is auto resolving column names when used in an expression i.e. making something like `select(flights, delay)` works in dply but we right now need `select(flights, flights$delay)` or `select(flights, "delay")`. But this is a complicated change and I'll file a new issue for it
cc sun-rui rxin
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Closes #6005 from shivaram/sparkr-df-api and squashes the following commits:
5e0716a [Shivaram Venkataraman] Fix some roxygen bugs
1254953 [Shivaram Venkataraman] Merge branch 'master' of https://github.com/apache/spark into sparkr-df-api
0521149 [Shivaram Venkataraman] Changes to make SparkR DataFrame dplyr friendly. Changes include 1. Rename sortDF to arrange 2. Add new aliases `group_by` and `sample_frac`, `summarize` 3. Add more user friendly column addition (mutate), rename 4. Support mean as an alias for avg in Scala and also support n_distinct, n as in dplyr
Diffstat (limited to 'external/twitter')
0 files changed, 0 insertions, 0 deletions