aboutsummaryrefslogtreecommitdiff
path: root/R
Commit message (Collapse)AuthorAgeFilesLines
...
* [SPARK-11755][R] SparkR should export "predict"Yanbo Liang2015-11-171-0/+4
| | | | | | | | | | | | | | | | | | | | | The bug described at [SPARK-11755](https://issues.apache.org/jira/browse/SPARK-11755), after exporting ```predict``` we can both get the help information from the SparkR and base R package like the following: ```Java > help(predict) Help on topic ‘predict’ was found in the following packages: Package Library SparkR /Users/yanboliang/data/trunk2/spark/R/lib stats /Library/Frameworks/R.framework/Versions/3.2/Resources/library Choose one 1: Make predictions from a model {SparkR} 2: Model Predictions {stats} ``` Author: Yanbo Liang <ybliang8@gmail.com> Closes #9732 from yanboliang/spark-11755.
* [SPARK-10500][SPARKR] sparkr.zip cannot be created if /R/lib is unwritableSun Rui2015-11-156-5/+30
| | | | | | | | | | | | | | | | | The basic idea is that: The archive of the SparkR package itself, that is sparkr.zip, is created during build process and is contained in the Spark binary distribution. No change to it after the distribution is installed as the directory it resides ($SPARK_HOME/R/lib) may not be writable. When there is R source code contained in jars or Spark packages specified with "--jars" or "--packages" command line option, a temporary directory is created by calling Utils.createTempDir() where the R packages built from the R source code will be installed. The temporary directory is writable, and won't interfere with each other when there are multiple SparkR sessions, and will be deleted when this SparkR session ends. The R binary packages installed in the temporary directory then are packed into an archive named rpkg.zip. sparkr.zip and rpkg.zip are distributed to the cluster in YARN modes. The distribution of rpkg.zip in Standalone modes is not supported in this PR, and will be address in another PR. Various R files are updated to accept multiple lib paths (one is for SparkR package, the other is for other R packages) so that these package can be accessed in R. Author: Sun Rui <rui.sun@intel.com> Closes #9390 from sun-rui/SPARK-10500.
* [SPARK-11086][SPARKR] Use dropFactors column-wise instead of nested loop ↵zero3232015-11-152-21/+49
| | | | | | | | | | | | | | | | when createDataFrame Use `dropFactors` column-wise instead of nested loop when `createDataFrame` from a `data.frame` At this moment SparkR createDataFrame is using nested loop to convert factors to character when called on a local data.frame. It works but is incredibly slow especially with data.table (~ 2 orders of magnitude compared to PySpark / Pandas version on a DateFrame of size 1M rows x 2 columns). A simple improvement is to apply `dropFactor `column-wise and then reshape output list. It should at least partially address [SPARK-8277](https://issues.apache.org/jira/browse/SPARK-8277). Author: zero323 <matthew.szymkiewicz@gmail.com> Closes #9099 from zero323/SPARK-11086.
* [SPARK-11263][SPARKR] lintr Throws Warnings on Commented Code in Documentationfelixcheung2015-11-128-1512/+1539
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clean out hundreds of `style: Commented code should be removed.` from lintr Like these: ``` /opt/spark-1.6.0-bin-hadoop2.6/R/pkg/R/DataFrame.R:513:3: style: Commented code should be removed. # sc <- sparkR.init() ^~~~~~~~~~~~~~~~~~~ /opt/spark-1.6.0-bin-hadoop2.6/R/pkg/R/DataFrame.R:514:3: style: Commented code should be removed. # sqlContext <- sparkRSQL.init(sc) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /opt/spark-1.6.0-bin-hadoop2.6/R/pkg/R/DataFrame.R:515:3: style: Commented code should be removed. # path <- "path/to/file.json" ^~~~~~~~~~~~~~~~~~~~~~~~~~~ ``` tried without export or rdname, neither work instead, added this `#' noRd` to suppress .Rd file generation also updated `family` for DataFrame functions for longer descriptive text instead of `dataframe_funcs` ![image](https://cloud.githubusercontent.com/assets/8969467/10933937/17bf5b1e-8291-11e5-9777-40fc632105dc.png) this covers *most* of 'Commented code' but I left out a few that looks legitimate. Author: felixcheung <felixcheung_m@hotmail.com> Closes #9463 from felixcheung/rlintr.
* [SPARK-11420] Updating Stddev support via Imperative AggregateJihongMa2015-11-121-2/+2
| | | | | | | | switched stddev support from DeclarativeAggregate to ImperativeAggregate. Author: JihongMa <linlin200605@gmail.com> Closes #9380 from JihongMA/SPARK-11420.
* [SPARK-11468] [SPARKR] add stddev/variance agg functions for Columnfelixcheung2015-11-105-30/+297
| | | | | | | | | | Checked names, none of them should conflict with anything in base shivaram davies rxin Author: felixcheung <felixcheung_m@hotmail.com> Closes #9489 from felixcheung/rstddev.
* [ML][R] SparkR::glm summary result to compare with native RYanbo Liang2015-11-102-22/+11
| | | | | | | | Follow up #9561. Due to [SPARK-11587](https://issues.apache.org/jira/browse/SPARK-11587) has been fixed, we should compare SparkR::glm summary result with native R output rather than hard-code one. mengxr Author: Yanbo Liang <ybliang8@gmail.com> Closes #9590 from yanboliang/glm-r-test.
* [SPARK-10863][SPARKR] Method coltypes() (New version)Oscar D. Lara Yejas2015-11-107-18/+124
| | | | | | | | This is a follow up on PR #8984, as the corresponding branch for such PR was damaged. Author: Oscar D. Lara Yejas <olarayej@mail.usf.edu> Closes #9579 from olarayej/SPARK-10863_NEW14.
* [SPARK-9830][SQL] Remove AggregateExpression1 and Aggregate Operator used to ↵Yin Huai2015-11-101-1/+1
| | | | | | | | | | | | | | | | | | | evaluate AggregateExpression1s https://issues.apache.org/jira/browse/SPARK-9830 This PR contains the following main changes. * Removing `AggregateExpression1`. * Removing `Aggregate` operator, which is used to evaluate `AggregateExpression1`. * Removing planner rule used to plan `Aggregate`. * Linking `MultipleDistinctRewriter` to analyzer. * Renaming `AggregateExpression2` to `AggregateExpression` and `AggregateFunction2` to `AggregateFunction`. * Updating places where we create aggregate expression. The way to create aggregate expressions is `AggregateExpression(aggregateFunction, mode, isDistinct)`. * Changing `val`s in `DeclarativeAggregate`s that touch children of this function to `lazy val`s (when we create aggregate expression in DataFrame API, children of an aggregate function can be unresolved). Author: Yin Huai <yhuai@databricks.com> Closes #9556 from yhuai/removeAgg1.
* [SPARK-11587][SPARKR] Fix the summary generic to match base RShivaram Venkataraman2015-11-094-10/+16
| | | | | | | | | The signature is summary(object, ...) as defined in https://stat.ethz.ch/R-manual/R-devel/library/base/html/summary.html Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #9582 from shivaram/summary-fix.
* [SPARK-9865][SPARKR] Flaky SparkR test: test_sparkSQL.R: sample on a DataFramefelixcheung2015-11-091-2/+2
| | | | | | | | | | | | | Make sample test less flaky by setting the seed Tested with ``` repeat { if (count(sample(df, FALSE, 0.1)) == 3) { break } } ``` Author: felixcheung <felixcheung_m@hotmail.com> Closes #9549 from felixcheung/rsample.
* [SPARK-11494][ML][R] Expose R-like summary statistics in SparkR::glm for ↵Yanbo Liang2015-11-092-11/+42
| | | | | | | | | | | | | | | | | | | | | | linear regression Expose R-like summary statistics in SparkR::glm for linear regression, the output of ```summary``` like ```Java $DevianceResiduals Min Max -0.9509607 0.7291832 $Coefficients Estimate Std. Error t value Pr(>|t|) (Intercept) 1.6765 0.2353597 7.123139 4.456124e-11 Sepal_Length 0.3498801 0.04630128 7.556598 4.187317e-12 Species_versicolor -0.9833885 0.07207471 -13.64402 0 Species_virginica -1.00751 0.09330565 -10.79796 0 ``` Author: Yanbo Liang <ybliang8@gmail.com> Closes #9561 from yanboliang/spark-11494.
* [SPARK-10116][CORE] XORShiftRandom.hashSeed is random in high bitsImran Rashid2015-11-061-4/+4
| | | | | | | | | | | | https://issues.apache.org/jira/browse/SPARK-10116 This is really trivial, just happened to notice it -- if `XORShiftRandom.hashSeed` is really supposed to have random bits throughout (as the comment implies), it needs to do something for the conversion to `long`. mengxr mkolod Author: Imran Rashid <irashid@cloudera.com> Closes #8314 from squito/SPARK-10116.
* [SPARK-11542] [SPARKR] fix glm with long fomularDavies Liu2015-11-052-1/+14
| | | | | | | | Because deparse() will break the long string into multiple lines, the deserialization will fail Author: Davies Liu <davies@databricks.com> Closes #9510 from davies/fix_glm.
* [SPARK-11260][SPARKR] with() function supportadrian5552015-11-055-6/+51
| | | | | | | Author: adrian555 <wzhuang@us.ibm.com> Author: Adrian Zhuang <adrian555@users.noreply.github.com> Closes #9443 from adrian555/with.
* [SPARK-9492][ML][R] LogisticRegression in R should provide model statisticsYanbo Liang2015-11-041-0/+17
| | | | | | | | Like ml ```LinearRegression```, ```LogisticRegression``` should provide a training summary including feature names and their coefficients. Author: Yanbo Liang <ybliang8@gmail.com> Closes #9303 from yanboliang/spark-9492.
* [DOC] Missing link to R DataFrame API doclewuathe2015-11-031-8/+97
| | | | | | | Author: lewuathe <lewuathe@me.com> Author: Lewuathe <lewuathe@me.com> Closes #9394 from Lewuathe/missing-link-to-R-dataframe.
* [SPARK-10592] [ML] [PySpark] Deprecate weights and use coefficients instead ↵vectorijk2015-11-021-3/+3
| | | | | | | | | | in ML models Deprecated in `LogisticRegression` and `LinearRegression` Author: vectorijk <jiangkai@gmail.com> Closes #9311 from vectorijk/spark-10592.
* [SPARK-11340][SPARKR] Support setting driver properties when starting Spark ↵felixcheung2015-10-302-5/+67
| | | | | | | | | | | | | from R programmatically or from RStudio Mapping spark.driver.memory from sparkEnvir to spark-submit commandline arguments. shivaram suggested that we possibly add other spark.driver.* properties - do we want to add all of those? I thought those could be set in SparkConf? sun-rui Author: felixcheung <felixcheung_m@hotmail.com> Closes #9290 from felixcheung/rdrivermem.
* [SPARK-11210][SPARKR] Add window functions into SparkR [step 2].Sun Rui2015-10-304-0/+117
| | | | | | Author: Sun Rui <rui.sun@intel.com> Closes #9196 from sun-rui/SPARK-11210.
* [SPARK-11409][SPARKR] Enable url link in R doc for Persistfelixcheung2015-10-291-2/+2
| | | | | | | | | | | | Quick one line doc fix link is not clickable ![image](https://cloud.githubusercontent.com/assets/8969467/10833041/4e91dd7c-7e4c-11e5-8905-713b986dbbde.png) shivaram Author: felixcheung <felixcheung_m@hotmail.com> Closes #9363 from felixcheung/rpersistdoc.
* [SPARK-11369][ML][R] SparkR glm should support setting standardizeYanbo Liang2015-10-281-2/+2
| | | | | | | | | | SparkR glm currently support : ```formula, family = c(“gaussian”, “binomial”), data, lambda = 0, alpha = 0``` We should also support setting standardize which has been defined at [design documentation](https://docs.google.com/document/d/10NZNSEurN2EdWM31uFYsgayIPfCFHiuIu3pCWrUmP_c/edit) Author: Yanbo Liang <ybliang8@gmail.com> Closes #9331 from yanboliang/spark-11369.
* [SPARK-11209][SPARKR] Add window functions into SparkR [step 1].Sun Rui2015-10-264-0/+120
| | | | | | Author: Sun Rui <rui.sun@intel.com> Closes #9193 from sun-rui/SPARK-11209.
* [SPARK-10979][SPARKR] Sparkrmerge: Add merge to DataFrame with R signatureNarine Kokhlikyan2015-10-262-8/+169
| | | | | | | | | Add merge function to DataFrame, which supports R signature. https://stat.ethz.ch/R-manual/R-devel/library/base/html/merge.html Author: Narine Kokhlikyan <narine.kokhlikyan@gmail.com> Closes #9012 from NarineK/sparkrmerge.
* [SPARK-11294][SPARKR] Improve R doc for read.df, write.df, saveAsTablefelixcheung2015-10-232-19/+24
| | | | | | | | | | | | | | | Add examples for read.df, write.df; fix grouping for read.df, loadDF; fix formatting and text truncation for write.df, saveAsTable. Several text issues: ![image](https://cloud.githubusercontent.com/assets/8969467/10708590/1303a44e-79c3-11e5-854f-3a2e16854cd7.png) - text collapsed into a single paragraph - text truncated at 2 places, eg. "overwrite: Existing data is expected to be overwritten by the contents of error:" shivaram Author: felixcheung <felixcheung_m@hotmail.com> Closes #9261 from felixcheung/rdocreadwritedf.
* [SPARK-11244][SPARKR] sparkR.stop() should remove SQLContextForest Fang2015-10-222-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | SparkR should remove `.sparkRSQLsc` and `.sparkRHivesc` when `sparkR.stop()` is called. Otherwise even when SparkContext is reinitialized, `sparkRSQL.init` returns the stale copy of the object and complains: ```r sc <- sparkR.init("local") sqlContext <- sparkRSQL.init(sc) sparkR.stop() sc <- sparkR.init("local") sqlContext <- sparkRSQL.init(sc) sqlContext ``` producing ```r Error in callJMethod(x, "getClass") : Invalid jobj 1. If SparkR was restarted, Spark operations need to be re-executed. ``` I have added the check and removal only when SparkContext itself is initialized. I have also added corresponding test for this fix. Let me know if you want me to move the test to SQL test suite instead. p.s. I tried lint-r but ended up a lots of errors on existing code. Author: Forest Fang <forest.fang@outlook.com> Closes #9205 from saurfang/sparkR.stop.
* [SPARK-11197][SQL] run SQL on files directlyDavies Liu2015-10-211-1/+1
| | | | | | | | | | | | This PR introduce a new feature to run SQL directly on files without create a table, for example: ``` select id from json.`path/to/json/files` as j ``` Author: Davies Liu <davies@databricks.com> Closes #9173 from davies/source.
* [SPARK-11221][SPARKR] fix R doc for lit and add examplesfelixcheung2015-10-201-4/+9
| | | | | | | | | Currently the documentation for `lit` is inconsistent with doc format, references "Scala symbol" and has no example. Fixing that. shivaram Author: felixcheung <felixcheung_m@hotmail.com> Closes #9187 from felixcheung/rlit.
* [SPARK-10668] [ML] Use WeightedLeastSquares in LinearRegression with L…lewuathe2015-10-192-3/+4
| | | | | | | | | | | …2 regularization if the number of features is small Author: lewuathe <lewuathe@me.com> Author: Lewuathe <sasaki@treasure-data.com> Author: Kai Sasaki <sasaki@treasure-data.com> Author: Lewuathe <lewuathe@me.com> Closes #8884 from Lewuathe/SPARK-10668.
* [SPARK-10996] [SPARKR] Implement sampleBy() in DataFrameStatFunctions.Sun Rui2015-10-137-19/+76
| | | | | | Author: Sun Rui <rui.sun@intel.com> Closes #9023 from sun-rui/SPARK-10996.
* [SPARK-10981] [SPARKR] SparkR Join improvementsMonica Liu2015-10-132-6/+34
| | | | | | | | | | I was having issues with collect() and orderBy() in Spark 1.5.0 so I used the DataFrame.R file and test_sparkSQL.R file from the Spark 1.5.1 download. I only modified the join() function in DataFrame.R to include "full", "fullouter", "left", "right", and "leftsemi" and added corresponding test cases in the test for join() and merge() in test_sparkSQL.R file. Pull request because I filed this JIRA bug report: https://issues.apache.org/jira/browse/SPARK-10981 Author: Monica Liu <liu.monica.f@gmail.com> Closes #9029 from mfliu/master.
* [SPARK-10913] [SPARKR] attach() function supportAdrian Zhuang2015-10-134-0/+55
| | | | | | | | | Bring the change code up to date. Author: Adrian Zhuang <adrian555@users.noreply.github.com> Author: adrian555 <wzhuang@us.ibm.com> Closes #9031 from adrian555/attach2.
* [SPARK-10888] [SPARKR] Added as.DataFrame as a synonym to createDataFrameNarine Kokhlikyan2015-10-133-5/+30
| | | | | | | | | as.DataFrame is more a R-style like signature. Also, I'd like to know if we could make the context, e.g. sqlContext global, so that we do not have to specify it as an argument, when we each time create a dataframe. Author: Narine Kokhlikyan <narine.kokhlikyan@gmail.com> Closes #8952 from NarineK/sparkrasDataFrame.
* [SPARK-10051] [SPARKR] Support collecting data of StructType in DataFrameSun Rui2015-10-137-48/+127
| | | | | | | | | | | | Two points in this PR: 1. Originally thought was that a named R list is assumed to be a struct in SerDe. But this is problematic because some R functions will implicitly generate named lists that are not intended to be a struct when transferred by SerDe. So SerDe clients have to explicitly mark a names list as struct by changing its class from "list" to "struct". 2. SerDe is in the Spark Core module, and data of StructType is represented as GenricRow which is defined in Spark SQL module. SerDe can't import GenricRow as in maven build Spark SQL module depends on Spark Core module. So this PR adds a registration hook in SerDe to allow SQLUtils in Spark SQL module to register its functions for serialization and deserialization of StructType. Author: Sun Rui <rui.sun@intel.com> Closes #8794 from sun-rui/SPARK-10051.
* [SPARK-10079] [SPARKR] Make 'column' and 'col' functions be S4 functions.Sun Rui2015-10-095-9/+34
| | | | | | | | | | | 1. Add a "col" function into DataFrame. 2. Move the current "col" function in Column.R to functions.R, convert it to S4 function. 3. Add a s4 "column" function in functions.R. 4. Convert the "column" function in Column.R to S4 function. This is for private use. Author: Sun Rui <rui.sun@intel.com> Closes #8864 from sun-rui/SPARK-10079.
* [SPARK-10905] [SPARKR] Export freqItems() for DataFrameStatFunctionsRerngvit Yanggratoke2015-10-094-0/+53
| | | | | | | | | | | [SPARK-10905][SparkR]: Export freqItems() for DataFrameStatFunctions - Add function (together with roxygen2 doc) to DataFrame.R and generics.R - Expose the function in NAMESPACE - Add unit test for the function Author: Rerngvit Yanggratoke <rerngvit@kth.se> Closes #8962 from rerngvit/SPARK-10905.
* [SPARK-10836] [SPARKR] Added sort(x, decreasing, col, ... ) method to DataFrameNarine Kokhlikyan2015-10-082-9/+49
| | | | | | | | | | | | | | | the sort function can be used as an alternative to arrange(... ). As arguments it accepts x - dataframe, decreasing - TRUE/FALSE, a list of orderings for columns and the list of columns, represented as string names for example: sort(df, TRUE, "col1","col2","col3","col5") # for example, if we want to sort some of the columns in the same order sort(df, decreasing=TRUE, "col1") sort(df, decreasing=c(TRUE,FALSE), "col1","col2") Author: Narine Kokhlikyan <narine.kokhlikyan@gmail.com> Closes #8920 from NarineK/sparkrsort.
* [SPARK-10752] [SPARKR] Implement corr() and cov in DataFrameStatFunctions.Sun Rui2015-10-076-33/+127
| | | | | | Author: Sun Rui <rui.sun@intel.com> Closes #8869 from sun-rui/SPARK-10752.
* [SPARK-10904] [SPARKR] Fix to support `select(df, c("col1", "col2"))`felixcheung2015-10-032-6/+21
| | | | | | | | The fix is to coerce `c("a", "b")` into a list such that it could be serialized to call JVM with. Author: felixcheung <felixcheung_m@hotmail.com> Closes #8961 from felixcheung/rselect.
* [SPARK-10807] [SPARKR] Added as.data.frame as a synonym for collectOscar D. Lara Yejas2015-09-304-1/+39
| | | | | | | | | | Created method as.data.frame as a synonym for collect(). Author: Oscar D. Lara Yejas <olarayej@mail.usf.edu> Author: olarayej <oscar.lara.yejas@us.ibm.com> Author: Oscar D. Lara Yejas <oscar.lara.yejas@us.ibm.com> Closes #8908 from olarayej/SPARK-10807.
* [SPARK-10760] [SPARKR] SparkR glm: the documentation in examples - family ↵Narine Kokhlikyan2015-09-251-1/+2
| | | | | | | | | | | | | | | | | | | | | argument is missing Hi everyone, Since the family argument is required for the glm function, the execution of: model <- glm(Sepal_Length ~ Sepal_Width, df) is failing. I've fixed the documentation by adding the family argument and also added the summay(model) which will show the coefficients for the model. Thanks, Narine Author: Narine Kokhlikyan <narine.kokhlikyan@gmail.com> Closes #8870 from NarineK/sparkrml.
* [SPARK-9681] [ML] Support R feature interactions in RFormulaEric Liang2015-09-252-2/+10
| | | | | | | | | | | | This integrates the Interaction feature transformer with SparkR R formula support (i.e. support `:`). To generate reasonable ML attribute names for feature interactions, it was necessary to add the ability to read attribute the original attribute names back from `StructField`, and also to specify custom group prefixes in `VectorAssembler`. This also has the side-benefit of cleaning up the double-underscores in the attributes generated for non-interaction terms. mengxr Author: Eric Liang <ekl@databricks.com> Closes #8830 from ericl/interaction-2.
* [SPARK-10050] [SPARKR] Support collecting data of MapType in DataFrame.Sun Rui2015-09-164-23/+86
| | | | | | | | | 1. Support collecting data of MapType from DataFrame. 2. Support data of MapType in createDataFrame. Author: Sun Rui <rui.sun@intel.com> Closes #8711 from sun-rui/SPARK-10050.
* Update version to 1.6.0-SNAPSHOT.Reynold Xin2015-09-151-1/+1
| | | | | | Author: Reynold Xin <rxin@databricks.com> Closes #8350 from rxin/1.6.
* [SPARK-6548] Adding stddev to DataFrame functionsJihongMa2015-09-121-1/+1
| | | | | | | | | | | Adding STDDEV support for DataFrame using 1-pass online /parallel algorithm to compute variance. Please review the code change. Author: JihongMa <linlin200605@gmail.com> Author: Jihong MA <linlin200605@gmail.com> Author: Jihong MA <jihongma@jihongs-mbp.usca.ibm.com> Author: Jihong MA <jihongma@Jihongs-MacBook-Pro.local> Closes #6297 from JihongMA/SPARK-SQL.
* [SPARK-10049] [SPARKR] Support collecting data of ArraryType in DataFrame.Sun Rui2015-09-108-62/+95
| | | | | | | | | | | | | | this PR : 1. Enhance reflection in RBackend. Automatically matching a Java array to Scala Seq when finding methods. Util functions like seq(), listToSeq() in R side can be removed, as they will conflict with the Serde logic that transferrs a Scala seq to R side. 2. Enhance the SerDe to support transferring a Scala seq to R side. Data of ArrayType in DataFrame after collection is observed to be of Scala Seq type. 3. Support ArrayType in createDataFrame(). Author: Sun Rui <rui.sun@intel.com> Closes #8458 from sun-rui/SPARK-10049.
* [MINOR] Minor style fix in SparkRShivaram Venkataraman2015-09-041-1/+1
| | | | | | | | `dev/lintr-r` passes on my machine now Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #8601 from shivaram/sparkr-style-fix.
* [SPARK-8951] [SPARKR] support Unicode characters in collect()CHOIJAEHONG2015-09-033-3/+31
| | | | | | | | | Spark gives an error message and does not show the output when a field of the result DataFrame contains characters in CJK. I changed SerDe.scala in order that Spark support Unicode characters when writes a string to R. Author: CHOIJAEHONG <redrock07@naver.com> Closes #7494 from CHOIJAEHONG1/SPARK-8951.
* [SPARK-9803] [SPARKR] Add subset and transform + testsfelixcheung2015-08-284-17/+85
| | | | | | | | | | | | Add subset and transform Also reorganize `[` & `[[` to subset instead of select Note: for transform, transform is very similar to mutate. Spark doesn't seem to replace existing column with the name in mutate (ie. `mutate(df, age = df$age + 2)` - returned DataFrame has 2 columns with the same name 'age'), so therefore not doing that for now in transform. Though it is clearly stated it should replace column with matching name (should I open a JIRA for mutate/transform?) Author: felixcheung <felixcheung_m@hotmail.com> Closes #8503 from felixcheung/rsubset_transform.
* [SPARK-8952] [SPARKR] - Wrap normalizePath calls with suppressWarningsLuciano Resende2015-08-282-3/+3
| | | | | | | | This is based on davies comment on SPARK-8952 which suggests to only call normalizePath() when path starts with '~' Author: Luciano Resende <lresende@apache.org> Closes #8343 from lresende/SPARK-8952.