aboutsummaryrefslogtreecommitdiff
path: root/R/pkg
Commit message (Collapse)AuthorAgeFilesLines
* [SPARK-10079] [SPARKR] Make 'column' and 'col' functions be S4 functions.Sun Rui2015-10-095-9/+34
| | | | | | | | | | | 1. Add a "col" function into DataFrame. 2. Move the current "col" function in Column.R to functions.R, convert it to S4 function. 3. Add a s4 "column" function in functions.R. 4. Convert the "column" function in Column.R to S4 function. This is for private use. Author: Sun Rui <rui.sun@intel.com> Closes #8864 from sun-rui/SPARK-10079.
* [SPARK-10905] [SPARKR] Export freqItems() for DataFrameStatFunctionsRerngvit Yanggratoke2015-10-094-0/+53
| | | | | | | | | | | [SPARK-10905][SparkR]: Export freqItems() for DataFrameStatFunctions - Add function (together with roxygen2 doc) to DataFrame.R and generics.R - Expose the function in NAMESPACE - Add unit test for the function Author: Rerngvit Yanggratoke <rerngvit@kth.se> Closes #8962 from rerngvit/SPARK-10905.
* [SPARK-10836] [SPARKR] Added sort(x, decreasing, col, ... ) method to DataFrameNarine Kokhlikyan2015-10-082-9/+49
| | | | | | | | | | | | | | | the sort function can be used as an alternative to arrange(... ). As arguments it accepts x - dataframe, decreasing - TRUE/FALSE, a list of orderings for columns and the list of columns, represented as string names for example: sort(df, TRUE, "col1","col2","col3","col5") # for example, if we want to sort some of the columns in the same order sort(df, decreasing=TRUE, "col1") sort(df, decreasing=c(TRUE,FALSE), "col1","col2") Author: Narine Kokhlikyan <narine.kokhlikyan@gmail.com> Closes #8920 from NarineK/sparkrsort.
* [SPARK-10752] [SPARKR] Implement corr() and cov in DataFrameStatFunctions.Sun Rui2015-10-076-33/+127
| | | | | | Author: Sun Rui <rui.sun@intel.com> Closes #8869 from sun-rui/SPARK-10752.
* [SPARK-10904] [SPARKR] Fix to support `select(df, c("col1", "col2"))`felixcheung2015-10-032-6/+21
| | | | | | | | The fix is to coerce `c("a", "b")` into a list such that it could be serialized to call JVM with. Author: felixcheung <felixcheung_m@hotmail.com> Closes #8961 from felixcheung/rselect.
* [SPARK-10807] [SPARKR] Added as.data.frame as a synonym for collectOscar D. Lara Yejas2015-09-304-1/+39
| | | | | | | | | | Created method as.data.frame as a synonym for collect(). Author: Oscar D. Lara Yejas <olarayej@mail.usf.edu> Author: olarayej <oscar.lara.yejas@us.ibm.com> Author: Oscar D. Lara Yejas <oscar.lara.yejas@us.ibm.com> Closes #8908 from olarayej/SPARK-10807.
* [SPARK-10760] [SPARKR] SparkR glm: the documentation in examples - family ↵Narine Kokhlikyan2015-09-251-1/+2
| | | | | | | | | | | | | | | | | | | | | argument is missing Hi everyone, Since the family argument is required for the glm function, the execution of: model <- glm(Sepal_Length ~ Sepal_Width, df) is failing. I've fixed the documentation by adding the family argument and also added the summay(model) which will show the coefficients for the model. Thanks, Narine Author: Narine Kokhlikyan <narine.kokhlikyan@gmail.com> Closes #8870 from NarineK/sparkrml.
* [SPARK-9681] [ML] Support R feature interactions in RFormulaEric Liang2015-09-252-2/+10
| | | | | | | | | | | | This integrates the Interaction feature transformer with SparkR R formula support (i.e. support `:`). To generate reasonable ML attribute names for feature interactions, it was necessary to add the ability to read attribute the original attribute names back from `StructField`, and also to specify custom group prefixes in `VectorAssembler`. This also has the side-benefit of cleaning up the double-underscores in the attributes generated for non-interaction terms. mengxr Author: Eric Liang <ekl@databricks.com> Closes #8830 from ericl/interaction-2.
* [SPARK-10050] [SPARKR] Support collecting data of MapType in DataFrame.Sun Rui2015-09-164-23/+86
| | | | | | | | | 1. Support collecting data of MapType from DataFrame. 2. Support data of MapType in createDataFrame. Author: Sun Rui <rui.sun@intel.com> Closes #8711 from sun-rui/SPARK-10050.
* Update version to 1.6.0-SNAPSHOT.Reynold Xin2015-09-151-1/+1
| | | | | | Author: Reynold Xin <rxin@databricks.com> Closes #8350 from rxin/1.6.
* [SPARK-6548] Adding stddev to DataFrame functionsJihongMa2015-09-121-1/+1
| | | | | | | | | | | Adding STDDEV support for DataFrame using 1-pass online /parallel algorithm to compute variance. Please review the code change. Author: JihongMa <linlin200605@gmail.com> Author: Jihong MA <linlin200605@gmail.com> Author: Jihong MA <jihongma@jihongs-mbp.usca.ibm.com> Author: Jihong MA <jihongma@Jihongs-MacBook-Pro.local> Closes #6297 from JihongMA/SPARK-SQL.
* [SPARK-10049] [SPARKR] Support collecting data of ArraryType in DataFrame.Sun Rui2015-09-108-62/+95
| | | | | | | | | | | | | | this PR : 1. Enhance reflection in RBackend. Automatically matching a Java array to Scala Seq when finding methods. Util functions like seq(), listToSeq() in R side can be removed, as they will conflict with the Serde logic that transferrs a Scala seq to R side. 2. Enhance the SerDe to support transferring a Scala seq to R side. Data of ArrayType in DataFrame after collection is observed to be of Scala Seq type. 3. Support ArrayType in createDataFrame(). Author: Sun Rui <rui.sun@intel.com> Closes #8458 from sun-rui/SPARK-10049.
* [MINOR] Minor style fix in SparkRShivaram Venkataraman2015-09-041-1/+1
| | | | | | | | `dev/lintr-r` passes on my machine now Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #8601 from shivaram/sparkr-style-fix.
* [SPARK-8951] [SPARKR] support Unicode characters in collect()CHOIJAEHONG2015-09-033-3/+31
| | | | | | | | | Spark gives an error message and does not show the output when a field of the result DataFrame contains characters in CJK. I changed SerDe.scala in order that Spark support Unicode characters when writes a string to R. Author: CHOIJAEHONG <redrock07@naver.com> Closes #7494 from CHOIJAEHONG1/SPARK-8951.
* [SPARK-9803] [SPARKR] Add subset and transform + testsfelixcheung2015-08-284-17/+85
| | | | | | | | | | | | Add subset and transform Also reorganize `[` & `[[` to subset instead of select Note: for transform, transform is very similar to mutate. Spark doesn't seem to replace existing column with the name in mutate (ie. `mutate(df, age = df$age + 2)` - returned DataFrame has 2 columns with the same name 'age'), so therefore not doing that for now in transform. Though it is clearly stated it should replace column with matching name (should I open a JIRA for mutate/transform?) Author: felixcheung <felixcheung_m@hotmail.com> Closes #8503 from felixcheung/rsubset_transform.
* [SPARK-8952] [SPARKR] - Wrap normalizePath calls with suppressWarningsLuciano Resende2015-08-282-3/+3
| | | | | | | | This is based on davies comment on SPARK-8952 which suggests to only call normalizePath() when path starts with '~' Author: Luciano Resende <lresende@apache.org> Closes #8343 from lresende/SPARK-8952.
* [SPARK-10328] [SPARKR] Fix generic for na.omitShivaram Venkataraman2015-08-283-5/+26
| | | | | | | | | | S3 function is at https://stat.ethz.ch/R-manual/R-patched/library/stats/html/na.fail.html Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Author: Shivaram Venkataraman <shivaram.venkataraman@gmail.com> Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8495 from shivaram/na-omit-fix.
* [SPARK-10219] [SPARKR] Fix varargsToEnv and add test caseShivaram Venkataraman2015-08-262-1/+8
| | | | | | | | cc sun-rui davies Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #8475 from shivaram/varargs-fix.
* [MINOR] [SPARKR] Fix some validation problems in SparkRYu ISHIKAWA2015-08-263-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Getting rid of some validation problems in SparkR https://github.com/apache/spark/pull/7883 cc shivaram ``` inst/tests/test_Serde.R:26:1: style: Trailing whitespace is superfluous. ^~ inst/tests/test_Serde.R:34:1: style: Trailing whitespace is superfluous. ^~ inst/tests/test_Serde.R:37:38: style: Trailing whitespace is superfluous. expect_equal(class(x), "character") ^~ inst/tests/test_Serde.R:50:1: style: Trailing whitespace is superfluous. ^~ inst/tests/test_Serde.R:55:1: style: Trailing whitespace is superfluous. ^~ inst/tests/test_Serde.R:60:1: style: Trailing whitespace is superfluous. ^~ inst/tests/test_sparkSQL.R:611:1: style: Trailing whitespace is superfluous. ^~ R/DataFrame.R:664:1: style: Trailing whitespace is superfluous. ^~~~~~~~~~~~~~ R/DataFrame.R:670:55: style: Trailing whitespace is superfluous. df <- data.frame(row.names = 1 : nrow) ^~~~~~~~~~~~~~~~ R/DataFrame.R:672:1: style: Trailing whitespace is superfluous. ^~~~~~~~~~~~~~ R/DataFrame.R:686:49: style: Trailing whitespace is superfluous. df[[names[colIndex]]] <- vec ^~~~~~~~~~~~~~~~~~ ``` Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8474 from yu-iskw/minor-fix-sparkr.
* [SPARK-10308] [SPARKR] Add %in% to the exported namespaceShivaram Venkataraman2015-08-261-3/+4
| | | | | | | | | | I also checked all the other functions defined in column.R, functions.R and DataFrame.R and everything else looked fine. cc yu-iskw Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #8473 from shivaram/in-namespace.
* [SPARK-9316] [SPARKR] Add support for filtering using `[` (synonym for ↵felixcheung2015-08-252-1/+48
| | | | | | | | | | | | | | | | filter / select) Add support for ``` df[df$name == "Smith", c(1,2)] df[df$age %in% c(19, 30), 1:2] ``` shivaram Author: felixcheung <felixcheung_m@hotmail.com> Closes #8394 from felixcheung/rsubset.
* [SPARK-10048] [SPARKR] Support arbitrary nested Java array in serde.Sun Rui2015-08-255-64/+154
| | | | | | | | | | | This PR: 1. supports transferring arbitrary nested array from JVM to R side in SerDe; 2. based on 1, collect() implemenation is improved. Now it can support collecting data of complex types from a DataFrame. Author: Sun Rui <rui.sun@intel.com> Closes #8276 from sun-rui/SPARK-10048.
* [SPARK-10214] [SPARKR] [DOCS] Improve SparkR Column, DataFrame API docsYu ISHIKAWA2015-08-253-34/+109
| | | | | | | | | | | | | | | | | | | cc: shivaram ## Summary - Add name tags to each methods in DataFrame.R and column.R - Replace `rdname column` with `rdname {each_func}`. i.e. alias method : `rdname column` => `rdname alias` ## Generated PDF File https://drive.google.com/file/d/0B9biIZIU47lLNHN2aFpnQXlSeGs/view?usp=sharing ## JIRA [[SPARK-10214] Improve SparkR Column, DataFrame API docs - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-10214) Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8414 from yu-iskw/SPARK-10214.
* [SPARK-10118] [SPARKR] [DOCS] Improve SparkR API docs for 1.5 releaseYu ISHIKAWA2015-08-243-226/+1594
| | | | | | | | | | | | | | | | | | | | cc: shivaram ## Summary - Modify `tdname` of expression functions. i.e. `ascii`: `rdname functions` => `rdname ascii` - Replace the dynamical function definitions to the static ones because of thir documentations. ## Generated PDF File https://drive.google.com/file/d/0B9biIZIU47lLX2t6ZjRoRnBTSEU/view?usp=sharing ## JIRA [[SPARK-10118] Improve SparkR API docs for 1.5 release - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-10118) Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Author: Yuu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8386 from yu-iskw/SPARK-10118.
* [SPARK-10106] [SPARKR] Add `ifelse` Column function to SparkRYu ISHIKAWA2015-08-193-1/+22
| | | | | | | | | ### JIRA [[SPARK-10106] Add `ifelse` Column function to SparkR - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-10106) Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8303 from yu-iskw/SPARK-10106.
* [SPARK-9856] [SPARKR] Add expression functions into SparkR whose params are ↵Yu ISHIKAWA2015-08-194-6/+648
| | | | | | | | | | | | | complicated I added lots of Column functinos into SparkR. And I also added `rand(seed: Int)` and `randn(seed: Int)` in Scala. Since we need such APIs for R integer type. ### JIRA [[SPARK-9856] Add expression functions into SparkR whose params are complicated - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-9856) Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8264 from yu-iskw/SPARK-9856-3.
* [SPARK-10075] [SPARKR] Add `when` expressino function in SparkRYu ISHIKAWA2015-08-185-0/+45
| | | | | | | | | | | | | | | - Add `when` and `otherwise` as `Column` methods - Add `When` as an expression function - Add `%otherwise%` infix as an alias of `otherwise` Since R doesn't support a feature like method chaining, `otherwise(when(condition, value), value)` style is a little annoying for me. If `%otherwise%` looks strange for shivaram, I can remove it. What do you think? ### JIRA [[SPARK-10075] Add `when` expressino function in SparkR - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-10075) Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8266 from yu-iskw/SPARK-10075.
* [SPARKR] [MINOR] Get rid of a long line warningYu ISHIKAWA2015-08-181-1/+3
| | | | | | | | | | | | ``` R/functions.R:74:1: style: lines should not be more than 100 characters. jc <- callJStatic("org.apache.spark.sql.functions", "lit", ifelse(class(x) == "Column", xjc, x)) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``` Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8297 from yu-iskw/minor-lint-r.
* Bump SparkR version string to 1.5.0Hossein2015-08-181-1/+1
| | | | | | | | | | This patch is against master, but we need to apply it to 1.5 branch as well. cc shivaram and rxin Author: Hossein <hossein@databricks.com> Closes #8291 from falaki/SparkRVersion1.5.
* [SPARK-10007] [SPARKR] Update `NAMESPACE` file in SparkR for simple ↵Yuu ISHIKAWA2015-08-181-3/+47
| | | | | | | | | | | parameters functions ### JIRA [[SPARK-10007] Update `NAMESPACE` file in SparkR for simple parameters functions - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-10007) Author: Yuu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8277 from yu-iskw/SPARK-10007.
* [SPARK-9871] [SPARKR] Add expression functions into SparkR which have a ↵Yu ISHIKAWA2015-08-164-0/+75
| | | | | | | | | | | | | | | | | | variable parameter ### Summary - Add `lit` function - Add `concat`, `greatest`, `least` functions I think we need to improve `collect` function in order to implement `struct` function. Since `collect` doesn't work with arguments which includes a nested `list` variable. It seems that a list against `struct` still has `jobj` classes. So it would be better to solve this problem on another issue. ### JIRA [[SPARK-9871] Add expression functions into SparkR which have a variable parameter - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-9871) Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8194 from yu-iskw/SPARK-9856.
* [SPARK-8844] [SPARKR] head/collect is broken in SparkR.Sun Rui2015-08-162-6/+30
| | | | | | | | | | | | | | This is a WIP patch for SPARK-8844 for collecting reviews. This bug is about reading an empty DataFrame. in readCol(), lapply(1:numRows, function(x) { does not take into consideration the case where numRows = 0. Will add unit test case. Author: Sun Rui <rui.sun@intel.com> Closes #7419 from sun-rui/SPARK-8844.
* [SPARK-9855] [SPARKR] Add expression functions into SparkR whose params are ↵Yu ISHIKAWA2015-08-125-102/+309
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | simple I added lots of expression functions for SparkR. This PR includes only functions whose params are only `(Column)` or `(Column, Column)`. And I think we need to improve how to test those functions. However, it would be better to work on another issue. ## Diff Summary - Add lots of functions in `functions.R` and their generic in `generic.R` - Add aliases for `ceiling` and `sign` - Move expression functions from `column.R` to `functions.R` - Modify `rdname` from `column` to `functions` I haven't supported `not` function, because the name has a collesion with `testthat` package. I didn't think of the way to define it. ## New Supported Functions ``` approxCountDistinct ascii base64 bin bitwiseNOT ceil (alias: ceiling) crc32 dayofmonth dayofyear explode factorial hex hour initcap isNaN last_day length log2 ltrim md5 minute month negate quarter reverse round rtrim second sha1 signum (alias: sign) size soundex to_date trim unbase64 unhex weekofyear year datediff levenshtein months_between nanvl pmod ``` ## JIRA [[SPARK-9855] Add expression functions into SparkR whose params are simple - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-9855) Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #8123 from yu-iskw/SPARK-9855.
* [SPARK-9713] [ML] Document SparkR MLlib glm() integration in Spark 1.5Eric Liang2015-08-112-6/+6
| | | | | | | | | | This documents the use of R model formulae in the SparkR guide. Also fixes some bugs in the R api doc. mengxr Author: Eric Liang <ekl@databricks.com> Closes #8085 from ericl/docs.
* [SPARK-8313] R Spark packages supportBurak Yavuz2015-08-041-0/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | shivaram cafreeman Could you please help me in testing this out? Exposing and running `rPackageBuilder` from inside the shell works, but for some reason, I can't get it to work during Spark Submit. It just starts relaunching Spark Submit. For testing, you may use the R branch with [sbt-spark-package](https://github.com/databricks/sbt-spark-package). You can call spPackage, and then pass the jar using `--jars`. Author: Burak Yavuz <brkyvz@gmail.com> Closes #7139 from brkyvz/r-submit and squashes the following commits: 0de384f [Burak Yavuz] remove unused imports 2 d253708 [Burak Yavuz] removed unused imports 6603d0d [Burak Yavuz] addressed comments 4258ffe [Burak Yavuz] merged master ddfcc06 [Burak Yavuz] added zipping test 3a1be7d [Burak Yavuz] don't zip 77995df [Burak Yavuz] fix URI ac45527 [Burak Yavuz] added zipping of all libs e6bf7b0 [Burak Yavuz] add println ignores 1bc5554 [Burak Yavuz] add assumes for tests 9778e03 [Burak Yavuz] addressed comments b42b300 [Burak Yavuz] merged master ffd134e [Burak Yavuz] Merge branch 'master' of github.com:apache/spark into r-submit d867756 [Burak Yavuz] add apache header eff5ba1 [Burak Yavuz] ready for review 8838edb [Burak Yavuz] Merge branch 'master' of github.com:apache/spark into r-submit e5b5a06 [Burak Yavuz] added doc bb751ce [Burak Yavuz] fix null bug 0226768 [Burak Yavuz] fixed issues 8810beb [Burak Yavuz] R packages support
* [SPARK-9318] [SPARK-9320] [SPARKR] Aliases for merge and summary functions ↵Hossein2015-07-315-6/+48
| | | | | | | | | | | | | | | | | | | | | | | | on DataFrames This PR adds synonyms for ```merge``` and ```summary``` in SparkR DataFrame API. cc shivaram Author: Hossein <hossein@databricks.com> Closes #7806 from falaki/SPARK-9320 and squashes the following commits: 72600f7 [Hossein] Updated docs 92a6e75 [Hossein] Fixed merge generic signature issue 4c2b051 [Hossein] Fixing naming with mllib summary 0f3a64c [Hossein] Added ... to generic for merge 30fbaf8 [Hossein] Merged master ae1a4cf [Hossein] Merge branch 'master' into SPARK-9320 e8eb86f [Hossein] Add a generic for merge fc01f2d [Hossein] Added unit test 8d92012 [Hossein] Added merge as an alias for join 5b8bedc [Hossein] Added unit test 632693d [Hossein] Added summary as an alias for describe for DataFrame
* [SPARK-9324] [SPARK-9322] [SPARK-9321] [SPARKR] Some aliases for R-like ↵Hossein2015-07-314-3/+119
| | | | | | | | | | | | | | | | | | | | | | | functions in DataFrames Adds following aliases: * unique (distinct) * rbind (unionAll): accepts many DataFrames * nrow (count) * ncol * dim * names (columns): along with the replacement function to change names Author: Hossein <hossein@databricks.com> Closes #7764 from falaki/sparkR-alias and squashes the following commits: 56016f5 [Hossein] Updated R documentation 5e4a4d0 [Hossein] Removed extra code f51cbef [Hossein] Merge branch 'master' into sparkR-alias c1b88bd [Hossein] Moved setGeneric and other comments applied d9307f8 [Hossein] Added tests b5aa988 [Hossein] Added dim, ncol, nrow, names, rbind, and unique functions to DataFrames
* [SPARK-9510] [SPARKR] Remaining SparkR style fixesShivaram Venkataraman2015-07-312-4/+6
| | | | | | | | | | With the change in this patch, I get no more warnings from `./dev/lint-r` in my machine Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #7834 from shivaram/sparkr-style-fixes and squashes the following commits: 716cd8e [Shivaram Venkataraman] Remaining SparkR style fixes
* [SPARK-9053] [SPARKR] Fix spaces around parens, infix operators etc.Yu ISHIKAWA2015-07-319-12/+21
| | | | | | | | | | | | | | | | | ### JIRA [[SPARK-9053] Fix spaces around parens, infix operators etc. - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-9053) ### The Result of `lint-r` [The result of lint-r at the rivision:a4c83cb1e4b066cd60264b6572fd3e51d160d26a](https://gist.github.com/yu-iskw/d253d7f8ef351f86443d) Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #7584 from yu-iskw/SPARK-9053 and squashes the following commits: 613170f [Yu ISHIKAWA] Ignore a warning about a space before a left parentheses ede61e1 [Yu ISHIKAWA] Ignores two warnings about a space before a left parentheses. TODO: After updating `lintr`, we will remove the ignores de3e0db [Yu ISHIKAWA] Add '## nolint start' & '## nolint end' statement to ignore infix space warnings e233ea8 [Yu ISHIKAWA] [SPARK-9053][SparkR] Fix spaces around parens, infix operators etc.
* [SPARK-8742] [SPARKR] Improve SparkR error messages for DataFrame APIHossein2015-07-302-1/+8
| | | | | | | | | | | | | | | | This patch improves SparkR error message reporting, especially with DataFrame API. When there is a user error (e.g., malformed SQL query), the message of the cause is sent back through the RPC and the R client reads it and returns it back to user. cc shivaram Author: Hossein <hossein@databricks.com> Closes #7742 from falaki/SPARK-8742 and squashes the following commits: 4f643c9 [Hossein] Not logging exceptions in RBackendHandler 4a8005c [Hossein] Returning stack track of causing exception from RBackendHandler 5cf17f0 [Hossein] Adding unit test for error messages from SQLContext 2af75d5 [Hossein] Reading error message in case of failure and stoping with that message f479c99 [Hossein] Wrting exception cause message in JVM
* [SPARK-9463] [ML] Expose model coefficients with names in SparkR RFormulaEric Liang2015-07-303-1/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Preview: ``` > summary(m) features coefficients 1 (Intercept) 1.6765001 2 Sepal_Length 0.3498801 3 Species.versicolor -0.9833885 4 Species.virginica -1.0075104 ``` Design doc from umbrella task: https://docs.google.com/document/d/10NZNSEurN2EdWM31uFYsgayIPfCFHiuIu3pCWrUmP_c/edit cc mengxr Author: Eric Liang <ekl@databricks.com> Closes #7771 from ericl/summary and squashes the following commits: ccd54c3 [Eric Liang] second pass a5ca93b [Eric Liang] comments 2772111 [Eric Liang] clean up 70483ef [Eric Liang] fix test 7c247d4 [Eric Liang] Merge branch 'master' into summary 3c55024 [Eric Liang] working 8c539aa [Eric Liang] first pass
* [SPARK-9248] [SPARKR] Closing curly-braces should always be on their own lineYuu ISHIKAWA2015-07-304-14/+19
| | | | | | | | | | | | | | ### JIRA [[SPARK-9248] Closing curly-braces should always be on their own line - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-9248) ## The result of `dev/lint-r` [The result of `dev/lint-r` for SPARK-9248 at the revistion:6175d6cfe795fbd88e3ee713fac375038a3993a8](https://gist.github.com/yu-iskw/96cadcea4ce664c41f81) Author: Yuu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #7795 from yu-iskw/SPARK-9248 and squashes the following commits: c8eccd3 [Yuu ISHIKAWA] [SPARK-9248][SparkR] Closing curly-braces should always be on their own line
* [SPARK-9391] [ML] Support minus, dot, and intercept operators in SparkR RFormulaEric Liang2015-07-282-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adds '.', '-', and intercept parsing to RFormula. Also splits RFormulaParser into a separate file. Umbrella design doc here: https://docs.google.com/document/d/10NZNSEurN2EdWM31uFYsgayIPfCFHiuIu3pCWrUmP_c/edit?usp=sharing mengxr Author: Eric Liang <ekl@databricks.com> Closes #7707 from ericl/string-features-2 and squashes the following commits: 8588625 [Eric Liang] exclude complex types for . 8106ffe [Eric Liang] comments a9350bb [Eric Liang] s/var/val 9c50d4d [Eric Liang] Merge branch 'string-features' into string-features-2 581afb2 [Eric Liang] Merge branch 'master' into string-features 08ae539 [Eric Liang] Merge branch 'string-features' into string-features-2 f99131a [Eric Liang] comments cecec43 [Eric Liang] Merge branch 'string-features' into string-features-2 0bf3c26 [Eric Liang] update docs 4592df2 [Eric Liang] intercept supports 7412a2e [Eric Liang] Fri Jul 24 14:56:51 PDT 2015 3cf848e [Eric Liang] fix the parser 0556c2b [Eric Liang] Merge branch 'string-features' into string-features-2 c302a2c [Eric Liang] fix tests 9d1ac82 [Eric Liang] Merge remote-tracking branch 'upstream/master' into string-features e713da3 [Eric Liang] comments cd231a9 [Eric Liang] Wed Jul 22 17:18:44 PDT 2015 4d79193 [Eric Liang] revert to seq + distinct 169a085 [Eric Liang] tweak functional test a230a47 [Eric Liang] Merge branch 'master' into string-features 72bd6f3 [Eric Liang] fix merge d841cec [Eric Liang] Merge branch 'master' into string-features 5b2c4a2 [Eric Liang] Mon Jul 20 18:45:33 PDT 2015 b01c7c5 [Eric Liang] add test 8a637db [Eric Liang] encoder wip a1d03f4 [Eric Liang] refactor into estimator
* Use vector-friendly comparison for packages argument.trestletech2015-07-282-1/+5
| | | | | | | | | | | | | | | | | | Otherwise, `sparkR.init()` with multiple `sparkPackages` results in this warning: ``` Warning message: In if (packages != "") { : the condition has length > 1 and only the first element will be used ``` Author: trestletech <jeff.allen@trestletechnology.net> Closes #7701 from trestletech/compare-packages and squashes the following commits: 72c8b36 [trestletech] Correct function name. c52db0e [trestletech] Added test for multiple packages. 3aab1a7 [trestletech] Use vector-friendly comparison for packages argument.
* [SPARK-9230] [ML] Support StringType features in RFormulaEric Liang2015-07-271-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds StringType feature support via OneHotEncoder. As part of this task it was necessary to change RFormula to an Estimator, so that factor levels could be determined from the training dataset. Not sure if I am using uids correctly here, would be good to get reviewer help on that. cc mengxr Umbrella design doc: https://docs.google.com/document/d/10NZNSEurN2EdWM31uFYsgayIPfCFHiuIu3pCWrUmP_c/edit# Author: Eric Liang <ekl@databricks.com> Closes #7574 from ericl/string-features and squashes the following commits: f99131a [Eric Liang] comments 0bf3c26 [Eric Liang] update docs c302a2c [Eric Liang] fix tests 9d1ac82 [Eric Liang] Merge remote-tracking branch 'upstream/master' into string-features e713da3 [Eric Liang] comments 4d79193 [Eric Liang] revert to seq + distinct 169a085 [Eric Liang] tweak functional test a230a47 [Eric Liang] Merge branch 'master' into string-features 72bd6f3 [Eric Liang] fix merge d841cec [Eric Liang] Merge branch 'master' into string-features 5b2c4a2 [Eric Liang] Mon Jul 20 18:45:33 PDT 2015 b01c7c5 [Eric Liang] add test 8a637db [Eric Liang] encoder wip a1d03f4 [Eric Liang] refactor into estimator
* [SPARK-9249] [SPARKR] local variable assigned but may not be usedYu ISHIKAWA2015-07-242-5/+2
| | | | | | | | | | | | [[SPARK-9249] local variable assigned but may not be used - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-9249) https://gist.github.com/yu-iskw/0e5b0253c11769457ea5 Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #7640 from yu-iskw/SPARK-9249 and squashes the following commits: 7a51cab [Yu ISHIKAWA] [SPARK-9249][SparkR] local variable assigned but may not be used
* [SPARK-9243] [Documentation] null -> zero in crosstab docXiangrui Meng2015-07-231-1/+1
| | | | | | | | | | We forgot to update doc. brkyvz Author: Xiangrui Meng <meng@databricks.com> Closes #7608 from mengxr/SPARK-9243 and squashes the following commits: 0ea3236 [Xiangrui Meng] null -> zero in crosstab doc
* [SPARK-8364] [SPARKR] Add crosstab to SparkR DataFramesXiangrui Meng2015-07-224-0/+46
| | | | | | | | | | | | | | | | Add `crosstab` to SparkR DataFrames, which takes two column names and returns a local R data.frame. This is similar to `table` in R. However, `table` in SparkR is used for loading SQL tables as DataFrames. The return type is data.frame instead table for `crosstab` to be compatible with Scala/Python. I couldn't run R tests successfully on my local. Many unit tests failed. So let's try Jenkins. Author: Xiangrui Meng <meng@databricks.com> Closes #7318 from mengxr/SPARK-8364 and squashes the following commits: d75e894 [Xiangrui Meng] fix tests 53f6ddd [Xiangrui Meng] fix tests f1348d6 [Xiangrui Meng] update test 47cb088 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-8364 5621262 [Xiangrui Meng] first version without test
* [SPARK-9201] [ML] Initial integration of MLlib + SparkR using RFormulaEric Liang2015-07-205-0/+124
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This exposes the SparkR:::glm() and SparkR:::predict() APIs. It was necessary to change RFormula to silently drop the label column if it was missing from the input dataset, which is kind of a hack but necessary to integrate with the Pipeline API. The umbrella design doc for MLlib + SparkR integration can be viewed here: https://docs.google.com/document/d/10NZNSEurN2EdWM31uFYsgayIPfCFHiuIu3pCWrUmP_c/edit mengxr Author: Eric Liang <ekl@databricks.com> Closes #7483 from ericl/spark-8774 and squashes the following commits: 3dfac0c [Eric Liang] update 17ef516 [Eric Liang] more comments 1753a0f [Eric Liang] make glm generic b0f50f8 [Eric Liang] equivalence test 550d56d [Eric Liang] export methods c015697 [Eric Liang] second pass 117949a [Eric Liang] comments 5afbc67 [Eric Liang] test label columns 6b7f15f [Eric Liang] Fri Jul 17 14:20:22 PDT 2015 3a63ae5 [Eric Liang] Fri Jul 17 13:41:52 PDT 2015 ce61367 [Eric Liang] Fri Jul 17 13:41:17 PDT 2015 0299c59 [Eric Liang] Fri Jul 17 13:40:32 PDT 2015 e37603f [Eric Liang] Fri Jul 17 12:15:03 PDT 2015 d417d0c [Eric Liang] Merge remote-tracking branch 'upstream/master' into spark-8774 29a2ce7 [Eric Liang] Merge branch 'spark-8774-1' into spark-8774 d1959d2 [Eric Liang] clarify comment 2db68aa [Eric Liang] second round of comments dc3c943 [Eric Liang] address comments 5765ec6 [Eric Liang] fix style checks 1f361b0 [Eric Liang] doc d33211b [Eric Liang] r support fb0826b [Eric Liang] [SPARK-8774] Add R model formula with basic support as a transformer
* [SPARK-9052] [SPARKR] Fix comments after curly bracesYu ISHIKAWA2015-07-212-16/+30
| | | | | | | | | | | | | | | | | | [[SPARK-9052] Fix comments after curly braces - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-9052) This is the full result of lintr at the rivision:011551620faa87107a787530f074af3d9be7e695. [[SPARK-9052] the result of lint-r at the revision:011551620faa87107a787530f074af3d9be7e695](https://gist.github.com/yu-iskw/e7246041b173a3f29482) This is the difference of the result between before and after. https://gist.github.com/yu-iskw/e7246041b173a3f29482/revisions Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #7440 from yu-iskw/SPARK-9052 and squashes the following commits: 015d738 [Yu ISHIKAWA] Fix the indentations and move the placement of commna 5cc30fe [Yu ISHIKAWA] Fix the indentation in a condition 4ead0e5 [Yu ISHIKAWA] [SPARK-9052][SparkR] Fix comments after curly braces