| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
| |
Author: adrian555 <wzhuang@us.ibm.com>
Author: Adrian Zhuang <adrian555@users.noreply.github.com>
Closes #9443 from adrian555/with.
|
|
|
|
|
|
|
|
| |
Like ml ```LinearRegression```, ```LogisticRegression``` should provide a training summary including feature names and their coefficients.
Author: Yanbo Liang <ybliang8@gmail.com>
Closes #9303 from yanboliang/spark-9492.
|
|
|
|
|
|
|
| |
Author: lewuathe <lewuathe@me.com>
Author: Lewuathe <lewuathe@me.com>
Closes #9394 from Lewuathe/missing-link-to-R-dataframe.
|
|
|
|
|
|
|
|
|
|
| |
in ML models
Deprecated in `LogisticRegression` and `LinearRegression`
Author: vectorijk <jiangkai@gmail.com>
Closes #9311 from vectorijk/spark-10592.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
from R programmatically or from RStudio
Mapping spark.driver.memory from sparkEnvir to spark-submit commandline arguments.
shivaram suggested that we possibly add other spark.driver.* properties - do we want to add all of those? I thought those could be set in SparkConf?
sun-rui
Author: felixcheung <felixcheung_m@hotmail.com>
Closes #9290 from felixcheung/rdrivermem.
|
|
|
|
|
|
| |
Author: Sun Rui <rui.sun@intel.com>
Closes #9196 from sun-rui/SPARK-11210.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Quick one line doc fix
link is not clickable
![image](https://cloud.githubusercontent.com/assets/8969467/10833041/4e91dd7c-7e4c-11e5-8905-713b986dbbde.png)
shivaram
Author: felixcheung <felixcheung_m@hotmail.com>
Closes #9363 from felixcheung/rpersistdoc.
|
|
|
|
|
|
|
|
|
|
| |
SparkR glm currently support :
```formula, family = c(“gaussian”, “binomial”), data, lambda = 0, alpha = 0```
We should also support setting standardize which has been defined at [design documentation](https://docs.google.com/document/d/10NZNSEurN2EdWM31uFYsgayIPfCFHiuIu3pCWrUmP_c/edit)
Author: Yanbo Liang <ybliang8@gmail.com>
Closes #9331 from yanboliang/spark-11369.
|
|
|
|
|
|
| |
Author: Sun Rui <rui.sun@intel.com>
Closes #9193 from sun-rui/SPARK-11209.
|
|
|
|
|
|
|
|
|
| |
Add merge function to DataFrame, which supports R signature.
https://stat.ethz.ch/R-manual/R-devel/library/base/html/merge.html
Author: Narine Kokhlikyan <narine.kokhlikyan@gmail.com>
Closes #9012 from NarineK/sparkrmerge.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add examples for read.df, write.df; fix grouping for read.df, loadDF; fix formatting and text truncation for write.df, saveAsTable.
Several text issues:
![image](https://cloud.githubusercontent.com/assets/8969467/10708590/1303a44e-79c3-11e5-854f-3a2e16854cd7.png)
- text collapsed into a single paragraph
- text truncated at 2 places, eg. "overwrite: Existing data is expected to be overwritten by the contents of error:"
shivaram
Author: felixcheung <felixcheung_m@hotmail.com>
Closes #9261 from felixcheung/rdocreadwritedf.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SparkR should remove `.sparkRSQLsc` and `.sparkRHivesc` when `sparkR.stop()` is called. Otherwise even when SparkContext is reinitialized, `sparkRSQL.init` returns the stale copy of the object and complains:
```r
sc <- sparkR.init("local")
sqlContext <- sparkRSQL.init(sc)
sparkR.stop()
sc <- sparkR.init("local")
sqlContext <- sparkRSQL.init(sc)
sqlContext
```
producing
```r
Error in callJMethod(x, "getClass") :
Invalid jobj 1. If SparkR was restarted, Spark operations need to be re-executed.
```
I have added the check and removal only when SparkContext itself is initialized. I have also added corresponding test for this fix. Let me know if you want me to move the test to SQL test suite instead.
p.s. I tried lint-r but ended up a lots of errors on existing code.
Author: Forest Fang <forest.fang@outlook.com>
Closes #9205 from saurfang/sparkR.stop.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This PR introduce a new feature to run SQL directly on files without create a table, for example:
```
select id from json.`path/to/json/files` as j
```
Author: Davies Liu <davies@databricks.com>
Closes #9173 from davies/source.
|
|
|
|
|
|
|
|
|
| |
Currently the documentation for `lit` is inconsistent with doc format, references "Scala symbol" and has no example. Fixing that.
shivaram
Author: felixcheung <felixcheung_m@hotmail.com>
Closes #9187 from felixcheung/rlit.
|
|
|
|
|
|
|
|
|
|
|
| |
…2 regularization if the number of features is small
Author: lewuathe <lewuathe@me.com>
Author: Lewuathe <sasaki@treasure-data.com>
Author: Kai Sasaki <sasaki@treasure-data.com>
Author: Lewuathe <lewuathe@me.com>
Closes #8884 from Lewuathe/SPARK-10668.
|
|
|
|
|
|
| |
Author: Sun Rui <rui.sun@intel.com>
Closes #9023 from sun-rui/SPARK-10996.
|
|
|
|
|
|
|
|
|
|
| |
I was having issues with collect() and orderBy() in Spark 1.5.0 so I used the DataFrame.R file and test_sparkSQL.R file from the Spark 1.5.1 download. I only modified the join() function in DataFrame.R to include "full", "fullouter", "left", "right", and "leftsemi" and added corresponding test cases in the test for join() and merge() in test_sparkSQL.R file.
Pull request because I filed this JIRA bug report:
https://issues.apache.org/jira/browse/SPARK-10981
Author: Monica Liu <liu.monica.f@gmail.com>
Closes #9029 from mfliu/master.
|
|
|
|
|
|
|
|
|
| |
Bring the change code up to date.
Author: Adrian Zhuang <adrian555@users.noreply.github.com>
Author: adrian555 <wzhuang@us.ibm.com>
Closes #9031 from adrian555/attach2.
|
|
|
|
|
|
|
|
|
| |
as.DataFrame is more a R-style like signature.
Also, I'd like to know if we could make the context, e.g. sqlContext global, so that we do not have to specify it as an argument, when we each time create a dataframe.
Author: Narine Kokhlikyan <narine.kokhlikyan@gmail.com>
Closes #8952 from NarineK/sparkrasDataFrame.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Two points in this PR:
1. Originally thought was that a named R list is assumed to be a struct in SerDe. But this is problematic because some R functions will implicitly generate named lists that are not intended to be a struct when transferred by SerDe. So SerDe clients have to explicitly mark a names list as struct by changing its class from "list" to "struct".
2. SerDe is in the Spark Core module, and data of StructType is represented as GenricRow which is defined in Spark SQL module. SerDe can't import GenricRow as in maven build Spark SQL module depends on Spark Core module. So this PR adds a registration hook in SerDe to allow SQLUtils in Spark SQL module to register its functions for serialization and deserialization of StructType.
Author: Sun Rui <rui.sun@intel.com>
Closes #8794 from sun-rui/SPARK-10051.
|
|
|
|
|
|
|
|
|
|
|
| |
1. Add a "col" function into DataFrame.
2. Move the current "col" function in Column.R to functions.R, convert it to S4 function.
3. Add a s4 "column" function in functions.R.
4. Convert the "column" function in Column.R to S4 function. This is for private use.
Author: Sun Rui <rui.sun@intel.com>
Closes #8864 from sun-rui/SPARK-10079.
|
|
|
|
|
|
|
|
|
|
|
| |
[SPARK-10905][SparkR]: Export freqItems() for DataFrameStatFunctions
- Add function (together with roxygen2 doc) to DataFrame.R and generics.R
- Expose the function in NAMESPACE
- Add unit test for the function
Author: Rerngvit Yanggratoke <rerngvit@kth.se>
Closes #8962 from rerngvit/SPARK-10905.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the sort function can be used as an alternative to arrange(... ).
As arguments it accepts x - dataframe, decreasing - TRUE/FALSE, a list of orderings for columns and the list of columns, represented as string names
for example:
sort(df, TRUE, "col1","col2","col3","col5") # for example, if we want to sort some of the columns in the same order
sort(df, decreasing=TRUE, "col1")
sort(df, decreasing=c(TRUE,FALSE), "col1","col2")
Author: Narine Kokhlikyan <narine.kokhlikyan@gmail.com>
Closes #8920 from NarineK/sparkrsort.
|
|
|
|
|
|
| |
Author: Sun Rui <rui.sun@intel.com>
Closes #8869 from sun-rui/SPARK-10752.
|
|
|
|
|
|
|
|
| |
The fix is to coerce `c("a", "b")` into a list such that it could be serialized to call JVM with.
Author: felixcheung <felixcheung_m@hotmail.com>
Closes #8961 from felixcheung/rselect.
|
|
|
|
|
|
|
|
|
|
| |
Created method as.data.frame as a synonym for collect().
Author: Oscar D. Lara Yejas <olarayej@mail.usf.edu>
Author: olarayej <oscar.lara.yejas@us.ibm.com>
Author: Oscar D. Lara Yejas <oscar.lara.yejas@us.ibm.com>
Closes #8908 from olarayej/SPARK-10807.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
argument is missing
Hi everyone,
Since the family argument is required for the glm function, the execution of:
model <- glm(Sepal_Length ~ Sepal_Width, df)
is failing.
I've fixed the documentation by adding the family argument and also added the summay(model) which will show the coefficients for the model.
Thanks,
Narine
Author: Narine Kokhlikyan <narine.kokhlikyan@gmail.com>
Closes #8870 from NarineK/sparkrml.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This integrates the Interaction feature transformer with SparkR R formula support (i.e. support `:`).
To generate reasonable ML attribute names for feature interactions, it was necessary to add the ability to read attribute the original attribute names back from `StructField`, and also to specify custom group prefixes in `VectorAssembler`. This also has the side-benefit of cleaning up the double-underscores in the attributes generated for non-interaction terms.
mengxr
Author: Eric Liang <ekl@databricks.com>
Closes #8830 from ericl/interaction-2.
|
|
|
|
|
|
|
|
|
| |
1. Support collecting data of MapType from DataFrame.
2. Support data of MapType in createDataFrame.
Author: Sun Rui <rui.sun@intel.com>
Closes #8711 from sun-rui/SPARK-10050.
|
|
|
|
|
|
| |
Author: Reynold Xin <rxin@databricks.com>
Closes #8350 from rxin/1.6.
|
|
|
|
|
|
|
|
|
|
|
| |
Adding STDDEV support for DataFrame using 1-pass online /parallel algorithm to compute variance. Please review the code change.
Author: JihongMa <linlin200605@gmail.com>
Author: Jihong MA <linlin200605@gmail.com>
Author: Jihong MA <jihongma@jihongs-mbp.usca.ibm.com>
Author: Jihong MA <jihongma@Jihongs-MacBook-Pro.local>
Closes #6297 from JihongMA/SPARK-SQL.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
this PR :
1. Enhance reflection in RBackend. Automatically matching a Java array to Scala Seq when finding methods. Util functions like seq(), listToSeq() in R side can be removed, as they will conflict with the Serde logic that transferrs a Scala seq to R side.
2. Enhance the SerDe to support transferring a Scala seq to R side. Data of ArrayType in DataFrame
after collection is observed to be of Scala Seq type.
3. Support ArrayType in createDataFrame().
Author: Sun Rui <rui.sun@intel.com>
Closes #8458 from sun-rui/SPARK-10049.
|
|
|
|
|
|
|
|
| |
`dev/lintr-r` passes on my machine now
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Closes #8601 from shivaram/sparkr-style-fix.
|
|
|
|
|
|
|
|
|
| |
Spark gives an error message and does not show the output when a field of the result DataFrame contains characters in CJK.
I changed SerDe.scala in order that Spark support Unicode characters when writes a string to R.
Author: CHOIJAEHONG <redrock07@naver.com>
Closes #7494 from CHOIJAEHONG1/SPARK-8951.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add subset and transform
Also reorganize `[` & `[[` to subset instead of select
Note: for transform, transform is very similar to mutate. Spark doesn't seem to replace existing column with the name in mutate (ie. `mutate(df, age = df$age + 2)` - returned DataFrame has 2 columns with the same name 'age'), so therefore not doing that for now in transform.
Though it is clearly stated it should replace column with matching name (should I open a JIRA for mutate/transform?)
Author: felixcheung <felixcheung_m@hotmail.com>
Closes #8503 from felixcheung/rsubset_transform.
|
|
|
|
|
|
|
|
| |
This is based on davies comment on SPARK-8952 which suggests to only call normalizePath() when path starts with '~'
Author: Luciano Resende <lresende@apache.org>
Closes #8343 from lresende/SPARK-8952.
|
|
|
|
|
|
|
|
|
|
| |
S3 function is at https://stat.ethz.ch/R-manual/R-patched/library/stats/html/na.fail.html
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Author: Shivaram Venkataraman <shivaram.venkataraman@gmail.com>
Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
Closes #8495 from shivaram/na-omit-fix.
|
|
|
|
|
|
|
|
| |
cc sun-rui davies
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Closes #8475 from shivaram/varargs-fix.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Getting rid of some validation problems in SparkR
https://github.com/apache/spark/pull/7883
cc shivaram
```
inst/tests/test_Serde.R:26:1: style: Trailing whitespace is superfluous.
^~
inst/tests/test_Serde.R:34:1: style: Trailing whitespace is superfluous.
^~
inst/tests/test_Serde.R:37:38: style: Trailing whitespace is superfluous.
expect_equal(class(x), "character")
^~
inst/tests/test_Serde.R:50:1: style: Trailing whitespace is superfluous.
^~
inst/tests/test_Serde.R:55:1: style: Trailing whitespace is superfluous.
^~
inst/tests/test_Serde.R:60:1: style: Trailing whitespace is superfluous.
^~
inst/tests/test_sparkSQL.R:611:1: style: Trailing whitespace is superfluous.
^~
R/DataFrame.R:664:1: style: Trailing whitespace is superfluous.
^~~~~~~~~~~~~~
R/DataFrame.R:670:55: style: Trailing whitespace is superfluous.
df <- data.frame(row.names = 1 : nrow)
^~~~~~~~~~~~~~~~
R/DataFrame.R:672:1: style: Trailing whitespace is superfluous.
^~~~~~~~~~~~~~
R/DataFrame.R:686:49: style: Trailing whitespace is superfluous.
df[[names[colIndex]]] <- vec
^~~~~~~~~~~~~~~~~~
```
Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
Closes #8474 from yu-iskw/minor-fix-sparkr.
|
|
|
|
|
|
|
|
|
|
| |
I also checked all the other functions defined in column.R, functions.R and DataFrame.R and everything else looked fine.
cc yu-iskw
Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Closes #8473 from shivaram/in-namespace.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
filter / select)
Add support for
```
df[df$name == "Smith", c(1,2)]
df[df$age %in% c(19, 30), 1:2]
```
shivaram
Author: felixcheung <felixcheung_m@hotmail.com>
Closes #8394 from felixcheung/rsubset.
|
|
|
|
|
|
|
|
|
|
|
| |
This PR:
1. supports transferring arbitrary nested array from JVM to R side in SerDe;
2. based on 1, collect() implemenation is improved. Now it can support collecting data of complex types
from a DataFrame.
Author: Sun Rui <rui.sun@intel.com>
Closes #8276 from sun-rui/SPARK-10048.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
cc: shivaram
## Summary
- Add name tags to each methods in DataFrame.R and column.R
- Replace `rdname column` with `rdname {each_func}`. i.e. alias method : `rdname column` => `rdname alias`
## Generated PDF File
https://drive.google.com/file/d/0B9biIZIU47lLNHN2aFpnQXlSeGs/view?usp=sharing
## JIRA
[[SPARK-10214] Improve SparkR Column, DataFrame API docs - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-10214)
Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
Closes #8414 from yu-iskw/SPARK-10214.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
cc: shivaram
## Summary
- Modify `tdname` of expression functions. i.e. `ascii`: `rdname functions` => `rdname ascii`
- Replace the dynamical function definitions to the static ones because of thir documentations.
## Generated PDF File
https://drive.google.com/file/d/0B9biIZIU47lLX2t6ZjRoRnBTSEU/view?usp=sharing
## JIRA
[[SPARK-10118] Improve SparkR API docs for 1.5 release - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-10118)
Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
Author: Yuu ISHIKAWA <yuu.ishikawa@gmail.com>
Closes #8386 from yu-iskw/SPARK-10118.
|
|
|
|
|
|
|
|
|
| |
### JIRA
[[SPARK-10106] Add `ifelse` Column function to SparkR - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-10106)
Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
Closes #8303 from yu-iskw/SPARK-10106.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
complicated
I added lots of Column functinos into SparkR. And I also added `rand(seed: Int)` and `randn(seed: Int)` in Scala. Since we need such APIs for R integer type.
### JIRA
[[SPARK-9856] Add expression functions into SparkR whose params are complicated - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-9856)
Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
Closes #8264 from yu-iskw/SPARK-9856-3.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Add `when` and `otherwise` as `Column` methods
- Add `When` as an expression function
- Add `%otherwise%` infix as an alias of `otherwise`
Since R doesn't support a feature like method chaining, `otherwise(when(condition, value), value)` style is a little annoying for me. If `%otherwise%` looks strange for shivaram, I can remove it. What do you think?
### JIRA
[[SPARK-10075] Add `when` expressino function in SparkR - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-10075)
Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
Closes #8266 from yu-iskw/SPARK-10075.
|
|
|
|
|
|
|
|
|
|
|
|
| |
```
R/functions.R:74:1: style: lines should not be more than 100 characters.
jc <- callJStatic("org.apache.spark.sql.functions", "lit", ifelse(class(x) == "Column", xjc, x))
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```
Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
Closes #8297 from yu-iskw/minor-lint-r.
|
|
|
|
|
|
|
|
|
|
| |
This patch is against master, but we need to apply it to 1.5 branch as well.
cc shivaram and rxin
Author: Hossein <hossein@databricks.com>
Closes #8291 from falaki/SparkRVersion1.5.
|
|
|
|
|
|
|
|
|
|
|
| |
parameters functions
### JIRA
[[SPARK-10007] Update `NAMESPACE` file in SparkR for simple parameters functions - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-10007)
Author: Yuu ISHIKAWA <yuu.ishikawa@gmail.com>
Closes #8277 from yu-iskw/SPARK-10007.
|