aboutsummaryrefslogtreecommitdiff
path: root/R/pkg/NAMESPACE
diff options
context:
space:
mode:
authorShivaram Venkataraman <shivaram@cs.berkeley.edu>2015-05-08 18:29:57 -0700
committerShivaram Venkataraman <shivaram@cs.berkeley.edu>2015-05-08 18:29:57 -0700
commit0a901dd3a1eb3fd459d45b771ce4ad2cfef2a944 (patch)
treec7b2479550c1ebadca8f8b5f4caf6f63db0f57e1 /R/pkg/NAMESPACE
parentb6c797b08cbd08d7aab59ad0106af0f5f41ef186 (diff)
downloadspark-0a901dd3a1eb3fd459d45b771ce4ad2cfef2a944.tar.gz
spark-0a901dd3a1eb3fd459d45b771ce4ad2cfef2a944.tar.bz2
spark-0a901dd3a1eb3fd459d45b771ce4ad2cfef2a944.zip
[SPARK-7231] [SPARKR] Changes to make SparkR DataFrame dplyr friendly.
Changes include 1. Rename sortDF to arrange 2. Add new aliases `group_by` and `sample_frac`, `summarize` 3. Add more user friendly column addition (mutate), rename 4. Support mean as an alias for avg in Scala and also support n_distinct, n as in dplyr Using these changes we can pretty much run the examples as described in http://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html with the same syntax The only thing missing in SparkR is auto resolving column names when used in an expression i.e. making something like `select(flights, delay)` works in dply but we right now need `select(flights, flights$delay)` or `select(flights, "delay")`. But this is a complicated change and I'll file a new issue for it cc sun-rui rxin Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #6005 from shivaram/sparkr-df-api and squashes the following commits: 5e0716a [Shivaram Venkataraman] Fix some roxygen bugs 1254953 [Shivaram Venkataraman] Merge branch 'master' of https://github.com/apache/spark into sparkr-df-api 0521149 [Shivaram Venkataraman] Changes to make SparkR DataFrame dplyr friendly. Changes include 1. Rename sortDF to arrange 2. Add new aliases `group_by` and `sample_frac`, `summarize` 3. Add more user friendly column addition (mutate), rename 4. Support mean as an alias for avg in Scala and also support n_distinct, n as in dplyr
Diffstat (limited to 'R/pkg/NAMESPACE')
-rw-r--r--R/pkg/NAMESPACE11
1 files changed, 9 insertions, 2 deletions
diff --git a/R/pkg/NAMESPACE b/R/pkg/NAMESPACE
index 7611f479a6..819e9a24e5 100644
--- a/R/pkg/NAMESPACE
+++ b/R/pkg/NAMESPACE
@@ -9,7 +9,8 @@ export("print.jobj")
exportClasses("DataFrame")
-exportMethods("cache",
+exportMethods("arrange",
+ "cache",
"collect",
"columns",
"count",
@@ -20,6 +21,7 @@ exportMethods("cache",
"explain",
"filter",
"first",
+ "group_by",
"groupBy",
"head",
"insertInto",
@@ -28,12 +30,15 @@ exportMethods("cache",
"join",
"limit",
"orderBy",
+ "mutate",
"names",
"persist",
"printSchema",
"registerTempTable",
+ "rename",
"repartition",
"sampleDF",
+ "sample_frac",
"saveAsParquetFile",
"saveAsTable",
"saveDF",
@@ -42,7 +47,7 @@ exportMethods("cache",
"selectExpr",
"show",
"showDF",
- "sortDF",
+ "summarize",
"take",
"unionAll",
"unpersist",
@@ -72,6 +77,8 @@ exportMethods("abs",
"max",
"mean",
"min",
+ "n",
+ "n_distinct",
"rlike",
"sqrt",
"startsWith",