aboutsummaryrefslogtreecommitdiff
path: root/R/pkg
Commit message (Collapse)AuthorAgeFilesLines
* [SPARK-12232][SPARKR] New R API for read.table to avoid name conflictfelixcheung2016-01-194-20/+17
| | | | | | | | shivaram sorry it took longer to fix some conflicts, this is the change to add an alias for `table` Author: felixcheung <felixcheung_m@hotmail.com> Closes #10406 from felixcheung/readtable.
* [SPARK-12337][SPARKR] Implement dropDuplicates() method of DataFrame in SparkR.Sun Rui2016-01-194-1/+75
| | | | | | Author: Sun Rui <rui.sun@intel.com> Closes #10309 from sun-rui/SPARK-12337.
* [SPARK-12168][SPARKR] Add automated tests for conflicted function in Rfelixcheung2016-01-192-1/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently this is reported when loading the SparkR package in R (probably would add is.nan) ``` Loading required package: methods Attaching package: ‘SparkR’ The following objects are masked from ‘package:stats’: cov, filter, lag, na.omit, predict, sd, var The following objects are masked from ‘package:base’: colnames, colnames<-, intersect, rank, rbind, sample, subset, summary, table, transform ``` Adding this test adds an automated way to track changes to masked method. Also, the second part of this test check for those functions that would not be accessible without namespace/package prefix. Incidentally, this might point to how we would fix those inaccessible functions in base or stats. Looking for feedback for adding this test. Author: felixcheung <felixcheung_m@hotmail.com> Closes #10171 from felixcheung/rmaskedtest.
* [SPARK-12862][SPARKR] Jenkins does not run R testsfelixcheung2016-01-171-1/+1
| | | | | | | | | | | | Slight correction: I'm leaving sparkR as-is (ie. R file not supported) and fixed only run-tests.sh as shivaram described. I also assume we are going to cover all doc changes in https://issues.apache.org/jira/browse/SPARK-12846 instead of here. rxin shivaram zjffdu Author: felixcheung <felixcheung_m@hotmail.com> Closes #10792 from felixcheung/sparkRcmd.
* [SPARK-11031][SPARKR] Method str() on a DataFrameOscar D. Lara Yejas2016-01-155-22/+140
| | | | | | | | | Author: Oscar D. Lara Yejas <odlaraye@oscars-mbp.usca.ibm.com> Author: Oscar D. Lara Yejas <olarayej@mail.usf.edu> Author: Oscar D. Lara Yejas <oscar.lara.yejas@us.ibm.com> Author: Oscar D. Lara Yejas <odlaraye@oscars-mbp.attlocal.net> Closes #9613 from olarayej/SPARK-11031.
* [SPARK-12756][SQL] use hash expression in ExchangeWenchen Fan2016-01-131-1/+1
| | | | | | | | | | This PR makes bucketing and exchange share one common hash algorithm, so that we can guarantee the data distribution is same between shuffle and bucketed data source, which enables us to only shuffle one side when join a bucketed table and a normal one. This PR also fixes the tests that are broken by the new hash behaviour in shuffle. Author: Wenchen Fan <wenchen@databricks.com> Closes #10703 from cloud-fan/use-hash-expr-in-shuffle.
* [SPARK-12645][SPARKR] SparkR support hash functionYanbo Liang2016-01-094-1/+26
| | | | | | | | Add ```hash``` function for SparkR ```DataFrame```. Author: Yanbo Liang <ybliang8@gmail.com> Closes #10597 from yanboliang/spark-12645.
* [SPARK-12393][SPARKR] Add read.text and write.text for SparkRYanbo Liang2016-01-065-1/+82
| | | | | | | | | Add ```read.text``` and ```write.text``` for SparkR. cc sun-rui felixcheung shivaram Author: Yanbo Liang <ybliang8@gmail.com> Closes #10348 from yanboliang/spark-12393.
* [SPARK-12625][SPARKR][SQL] replace R usage of Spark SQL deprecated APIfelixcheung2016-01-045-25/+33
| | | | | | | | | | | rxin davies shivaram Took save mode from my PR #10480, and move everything to writer methods. This is related to PR #10559 - [x] it seems jsonRDD() is broken, need to investigate - this is not a public API though; will look into some more tonight. (fixed) Author: felixcheung <felixcheung_m@hotmail.com> Closes #10584 from felixcheung/rremovedeprecated.
* [SPARK-12327][SPARKR] fix code for lintr warning for commented codefelixcheung2016-01-039-11/+88
| | | | | | | | shivaram Author: felixcheung <felixcheung_m@hotmail.com> Closes #10408 from felixcheung/rcodecomment.
* [SPARK-11199][SPARKR] Improve R context management story and add getOrCreateHossein2015-12-291-0/+4
| | | | | | | | | | | * Changes api.r.SQLUtils to use ```SQLContext.getOrCreate``` instead of creating a new context. * Adds a simple test [SPARK-11199] #comment link with JIRA Author: Hossein <hossein@databricks.com> Closes #9185 from falaki/SPARK-11199.
* [SPARK-12526][SPARKR] ifelse`, `when`, `otherwise` unable to take Column as ↵Forest Fang2015-12-293-7/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | value `ifelse`, `when`, `otherwise` is unable to take `Column` typed S4 object as values. For example: ```r ifelse(lit(1) == lit(1), lit(2), lit(3)) ifelse(df$mpg > 0, df$mpg, 0) ``` will both fail with ```r attempt to replicate an object of type 'environment' ``` The PR replaces `ifelse` calls with `if ... else ...` inside the function implementations to avoid attempt to vectorize(i.e. `rep()`). It remains to be discussed whether we should instead support vectorization in these functions for consistency because `ifelse` in base R is vectorized but I cannot foresee any scenarios these functions will want to be vectorized in SparkR. For reference, added test cases which trigger failures: ```r . Error: when(), otherwise() and ifelse() with column on a DataFrame ---------- error in evaluating the argument 'x' in selecting a method for function 'collect': error in evaluating the argument 'col' in selecting a method for function 'select': attempt to replicate an object of type 'environment' Calls: when -> when -> ifelse -> ifelse 1: withCallingHandlers(eval(code, new_test_environment), error = capture_calls, message = function(c) invokeRestart("muffleMessage")) 2: eval(code, new_test_environment) 3: eval(expr, envir, enclos) 4: expect_equal(collect(select(df, when(df$a > 1 & df$b > 2, lit(1))))[, 1], c(NA, 1)) at test_sparkSQL.R:1126 5: expect_that(object, equals(expected, label = expected.label, ...), info = info, label = label) 6: condition(object) 7: compare(actual, expected, ...) 8: collect(select(df, when(df$a > 1 & df$b > 2, lit(1)))) Error: Test failures Execution halted ``` Author: Forest Fang <forest.fang@outlook.com> Closes #10481 from saurfang/spark-12526.
* Bump master version to 2.0.0-SNAPSHOT.Reynold Xin2015-12-191-1/+1
| | | | | | Author: Reynold Xin <rxin@databricks.com> Closes #10387 from rxin/version-bump.
* [SPARK-12310][SPARKR] Add write.json and write.parquet for SparkRYanbo Liang2015-12-164-56/+119
| | | | | | | | Add ```write.json``` and ```write.parquet``` for SparkR, and deprecated ```saveAsParquetFile```. Author: Yanbo Liang <ybliang8@gmail.com> Closes #10281 from yanboliang/spark-12310.
* [SPARK-12318][SPARKR] Save mode in SparkR should be error by defaultJeff Zhang2015-12-161-5/+5
| | | | | | | | shivaram Please help review. Author: Jeff Zhang <zjffdu@apache.org> Closes #10290 from zjffdu/SPARK-12318.
* [SPARK-12327] Disable commented code lintr temporarilyShivaram Venkataraman2015-12-141-1/+1
| | | | | | | | cc yhuai felixcheung shaneknapp Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #10300 from shivaram/comment-lintr-disable.
* [SPARK-12158][SPARKR][SQL] Fix 'sample' functions that break R unit test casesgatorsmile2015-12-112-6/+15
| | | | | | | | | | | The existing sample functions miss the parameter `seed`, however, the corresponding function interface in `generics` has such a parameter. Thus, although the function caller can call the function with the 'seed', we are not using the value. This could cause SparkR unit tests failed. For example, I hit it in another PR: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47213/consoleFull Author: gatorsmile <gatorsmile@gmail.com> Closes #10160 from gatorsmile/sampleR.
* [SPARK-12146][SPARKR] SparkR jsonFile should support multiple input filesYanbo Liang2015-12-114-115/+137
| | | | | | | | | | | | | | | | | * ```jsonFile``` should support multiple input files, such as: ```R jsonFile(sqlContext, c(“path1”, “path2”)) # character vector as arguments jsonFile(sqlContext, “path1,path2”) ``` * Meanwhile, ```jsonFile``` has been deprecated by Spark SQL and will be removed at Spark 2.0. So we mark ```jsonFile``` deprecated and use ```read.json``` at SparkR side. * Replace all ```jsonFile``` with ```read.json``` at test_sparkSQL.R, but still keep jsonFile test case. * If this PR is accepted, we should also make almost the same change for ```parquetFile```. cc felixcheung sun-rui shivaram Author: Yanbo Liang <ybliang8@gmail.com> Closes #10145 from yanboliang/spark-12146.
* [SPARK-12234][SPARKR] Fix ```subset``` function error when only set ↵Yanbo Liang2015-12-102-2/+11
| | | | | | | | | | | | ```select``` argument Fix ```subset``` function error when only set ```select``` argument. Please refer to the [JIRA](https://issues.apache.org/jira/browse/SPARK-12234) about the error and how to reproduce it. cc sun-rui felixcheung shivaram Author: Yanbo Liang <ybliang8@gmail.com> Closes #10217 from yanboliang/spark-12234.
* [SPARK-12198][SPARKR] SparkR support read.parquet and deprecate parquetFileYanbo Liang2015-12-103-6/+22
| | | | | | | | SparkR support ```read.parquet``` and deprecate ```parquetFile```. This change is similar with #10145 for ```jsonFile```. Author: Yanbo Liang <ybliang8@gmail.com> Closes #10191 from yanboliang/spark-12198.
* [SPARK-12034][SPARKR] Eliminate warnings in SparkR test cases.Sun Rui2015-12-0719-38/+49
| | | | | | | | | | | | | | This PR: 1. Suppress all known warnings. 2. Cleanup test cases and fix some errors in test cases. 3. Fix errors in HiveContext related test cases. These test cases are actually not run previously due to a bug of creating TestHiveContext. 4. Support 'testthat' package version 0.11.0 which prefers that test cases be under 'tests/testthat' 5. Make sure the default Hadoop file system is local when running test cases. 6. Turn on warnings into errors. Author: Sun Rui <rui.sun@intel.com> Closes #10030 from sun-rui/SPARK-12034.
* [SPARK-12044][SPARKR] Fix usage of isnan, isNaNYanbo Liang2015-12-054-11/+31
| | | | | | | | | | | | 1, Add ```isNaN``` to ```Column``` for SparkR. ```Column``` should has three related variable functions: ```isNaN, isNull, isNotNull```. 2, Replace ```DataFrame.isNaN``` with ```DataFrame.isnan``` at SparkR side. Because ```DataFrame.isNaN``` has been deprecated and will be removed at Spark 2.0. <del>3, Add ```isnull``` to ```DataFrame``` for SparkR. ```DataFrame``` should has two related functions: ```isnan, isnull```.<del> cc shivaram sun-rui felixcheung Author: Yanbo Liang <ybliang8@gmail.com> Closes #10037 from yanboliang/spark-12044.
* [SPARK-12115][SPARKR] Change numPartitions() to getNumPartitions() to be ↵Yanbo Liang2015-12-054-30/+45
| | | | | | | | | | | | | consistent with Scala/Python Change ```numPartitions()``` to ```getNumPartitions()``` to be consistent with Scala/Python. <del>Note: If we can not catch up with 1.6 release, it will be breaking change for 1.7 that we also need to explain in release note.<del> cc sun-rui felixcheung shivaram Author: Yanbo Liang <ybliang8@gmail.com> Closes #10123 from yanboliang/spark-12115.
* [SPARK-11715][SPARKR] Add R support corr for Column Aggregrationfelixcheung2015-12-054-6/+22
| | | | | | | | Need to match existing method signature Author: felixcheung <felixcheung_m@hotmail.com> Closes #9680 from felixcheung/rcorr.
* [SPARK-11774][SPARKR] Implement struct(), encode(), decode() functions in ↵Sun Rui2015-12-054-6/+105
| | | | | | | | SparkR. Author: Sun Rui <rui.sun@intel.com> Closes #9804 from sun-rui/SPARK-11774.
* [SPARK-12104][SPARKR] collect() does not handle multiple columns with same name.Sun Rui2015-12-032-4/+10
| | | | | | Author: Sun Rui <rui.sun@intel.com> Closes #10118 from sun-rui/SPARK-12104.
* [SPARK-12019][SPARKR] Support character vector for sparkR.init(), check ↵felixcheung2015-12-035-21/+79
| | | | | | | | | | | param and fix doc and add tests. Spark submit expects comma-separated list Author: felixcheung <felixcheung_m@hotmail.com> Closes #10034 from felixcheung/sparkrinitdoc.
* [SPARK-11781][SPARKR] SparkR has problem in inferring type of raw type.Sun Rui2015-11-294-32/+47
| | | | | | Author: Sun Rui <rui.sun@intel.com> Closes #9769 from sun-rui/SPARK-11781.
* [SPARK-9319][SPARKR] Add support for setting column names, typesfelixcheung2015-11-285-55/+185
| | | | | | | | | | | | | Add support for for colnames, colnames<-, coltypes<- Also added tests for names, names<- which have no test previously. I merged with PR 8984 (coltypes). Clicked the wrong thing, crewed up the PR. Recreated it here. Was #9218 shivaram sun-rui Author: felixcheung <felixcheung_m@hotmail.com> Closes #9654 from felixcheung/colnamescoltypes.
* [SPARK-12029][SPARKR] Improve column functions signature, param check, ↵felixcheung2015-11-282-34/+96
| | | | | | | | | | tests, fix doc and add examples shivaram sun-rui Author: felixcheung <felixcheung_m@hotmail.com> Closes #10019 from felixcheung/rfunctionsdoc.
* [SPARK-12025][SPARKR] Rename some window rank function names for SparkRYanbo Liang2015-11-274-41/+41
| | | | | | | | | | | | | | Change ```cumeDist -> cume_dist, denseRank -> dense_rank, percentRank -> percent_rank, rowNumber -> row_number``` at SparkR side. There are two reasons that we should make this change: * We should follow the [naming convention rule of R](http://www.inside-r.org/node/230645) * Spark DataFrame has deprecated the old convention (such as ```cumeDist```) and will remove it in Spark 2.0. It's better to fix this issue before 1.6 release, otherwise we will make breaking API change. cc shivaram sun-rui Author: Yanbo Liang <ybliang8@gmail.com> Closes #10016 from yanboliang/SPARK-12025.
* [SPARK-11756][SPARKR] Fix use of aliases - SparkR can not output help ↵felixcheung2015-11-204-84/+37
| | | | | | | | | | | | | | | | | information for SparkR:::summary correctly Fix use of aliases and changes uses of rdname and seealso `aliases` is the hint for `?` - it should not be linked to some other name - those should be seealso https://cran.r-project.org/web/packages/roxygen2/vignettes/rd.html Clean up usage on family, as multiple use of family with the same rdname is causing duplicated See Also html blocks (like http://spark.apache.org/docs/latest/api/R/count.html) Also changing some rdname for dplyr-like variant for better R user visibility in R doc, eg. rbind, summary, mutate, summarize shivaram yanboliang Author: felixcheung <felixcheung_m@hotmail.com> Closes #9750 from felixcheung/rdocaliases.
* [SPARK-11339][SPARKR] Document the list of functions in R base package that ↵felixcheung2015-11-185-5/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | are masked by functions with same name in SparkR Added tests for function that are reported as masked, to make sure the base:: or stats:: function can be called. For those we can't call, added them to SparkR programming guide. It would seem to me `table, sample, subset, filter, cov` not working are not actually expected - I investigated/experimented with them but couldn't get them to work. It looks like as they are defined in base or stats they are missing the S3 generic, eg. ``` > methods("transform") [1] transform,ANY-method transform.data.frame [3] transform,DataFrame-method transform.default see '?methods' for accessing help and source code > methods("subset") [1] subset.data.frame subset,DataFrame-method subset.default [4] subset.matrix see '?methods' for accessing help and source code Warning message: In .S3methods(generic.function, class, parent.frame()) : function 'subset' appears not to be S3 generic; found functions that look like S3 methods ``` Any idea? More information on masking: http://www.ats.ucla.edu/stat/r/faq/referencing_objects.htm http://www.sfu.ca/~sweldon/howTo/guide4.pdf This is what the output doc looks like (minus css): ![image](https://cloud.githubusercontent.com/assets/8969467/11229714/2946e5de-8d4d-11e5-94b0-dda9696b6fdd.png) Author: felixcheung <felixcheung_m@hotmail.com> Closes #9785 from felixcheung/rmasked.
* [SPARK-11684][R][ML][DOC] Update SparkR glm API doc, user guide and example ↵Yanbo Liang2015-11-181-3/+15
| | | | | | | | | | | | | | | | | codes This PR includes: * Update SparkR:::glm, SparkR:::summary API docs. * Update SparkR machine learning user guide and example codes to show: * supporting feature interaction in R formula. * summary for gaussian GLM model. * coefficients for binomial GLM model. mengxr Author: Yanbo Liang <ybliang8@gmail.com> Closes #9727 from yanboliang/spark-11684.
* [SPARK-11773][SPARKR] Implement collection functions in SparkR.Sun Rui2015-11-186-35/+100
| | | | | | Author: Sun Rui <rui.sun@intel.com> Closes #9764 from sun-rui/SPARK-11773.
* [SPARK-11281][SPARKR] Add tests covering the issue.zero3232015-11-181-3/+7
| | | | | | | | The goal of this PR is to add tests covering the issue to ensure that is was resolved by [SPARK-11086](https://issues.apache.org/jira/browse/SPARK-11086). Author: zero323 <matthew.szymkiewicz@gmail.com> Closes #9743 from zero323/SPARK-11281-tests.
* [SPARK-11755][R] SparkR should export "predict"Yanbo Liang2015-11-171-0/+4
| | | | | | | | | | | | | | | | | | | | | The bug described at [SPARK-11755](https://issues.apache.org/jira/browse/SPARK-11755), after exporting ```predict``` we can both get the help information from the SparkR and base R package like the following: ```Java > help(predict) Help on topic ‘predict’ was found in the following packages: Package Library SparkR /Users/yanboliang/data/trunk2/spark/R/lib stats /Library/Frameworks/R.framework/Versions/3.2/Resources/library Choose one 1: Make predictions from a model {SparkR} 2: Model Predictions {stats} ``` Author: Yanbo Liang <ybliang8@gmail.com> Closes #9732 from yanboliang/spark-11755.
* [SPARK-10500][SPARKR] sparkr.zip cannot be created if /R/lib is unwritableSun Rui2015-11-154-5/+20
| | | | | | | | | | | | | | | | | The basic idea is that: The archive of the SparkR package itself, that is sparkr.zip, is created during build process and is contained in the Spark binary distribution. No change to it after the distribution is installed as the directory it resides ($SPARK_HOME/R/lib) may not be writable. When there is R source code contained in jars or Spark packages specified with "--jars" or "--packages" command line option, a temporary directory is created by calling Utils.createTempDir() where the R packages built from the R source code will be installed. The temporary directory is writable, and won't interfere with each other when there are multiple SparkR sessions, and will be deleted when this SparkR session ends. The R binary packages installed in the temporary directory then are packed into an archive named rpkg.zip. sparkr.zip and rpkg.zip are distributed to the cluster in YARN modes. The distribution of rpkg.zip in Standalone modes is not supported in this PR, and will be address in another PR. Various R files are updated to accept multiple lib paths (one is for SparkR package, the other is for other R packages) so that these package can be accessed in R. Author: Sun Rui <rui.sun@intel.com> Closes #9390 from sun-rui/SPARK-10500.
* [SPARK-11086][SPARKR] Use dropFactors column-wise instead of nested loop ↵zero3232015-11-152-21/+49
| | | | | | | | | | | | | | | | when createDataFrame Use `dropFactors` column-wise instead of nested loop when `createDataFrame` from a `data.frame` At this moment SparkR createDataFrame is using nested loop to convert factors to character when called on a local data.frame. It works but is incredibly slow especially with data.table (~ 2 orders of magnitude compared to PySpark / Pandas version on a DateFrame of size 1M rows x 2 columns). A simple improvement is to apply `dropFactor `column-wise and then reshape output list. It should at least partially address [SPARK-8277](https://issues.apache.org/jira/browse/SPARK-8277). Author: zero323 <matthew.szymkiewicz@gmail.com> Closes #9099 from zero323/SPARK-11086.
* [SPARK-11263][SPARKR] lintr Throws Warnings on Commented Code in Documentationfelixcheung2015-11-128-1512/+1539
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clean out hundreds of `style: Commented code should be removed.` from lintr Like these: ``` /opt/spark-1.6.0-bin-hadoop2.6/R/pkg/R/DataFrame.R:513:3: style: Commented code should be removed. # sc <- sparkR.init() ^~~~~~~~~~~~~~~~~~~ /opt/spark-1.6.0-bin-hadoop2.6/R/pkg/R/DataFrame.R:514:3: style: Commented code should be removed. # sqlContext <- sparkRSQL.init(sc) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /opt/spark-1.6.0-bin-hadoop2.6/R/pkg/R/DataFrame.R:515:3: style: Commented code should be removed. # path <- "path/to/file.json" ^~~~~~~~~~~~~~~~~~~~~~~~~~~ ``` tried without export or rdname, neither work instead, added this `#' noRd` to suppress .Rd file generation also updated `family` for DataFrame functions for longer descriptive text instead of `dataframe_funcs` ![image](https://cloud.githubusercontent.com/assets/8969467/10933937/17bf5b1e-8291-11e5-9777-40fc632105dc.png) this covers *most* of 'Commented code' but I left out a few that looks legitimate. Author: felixcheung <felixcheung_m@hotmail.com> Closes #9463 from felixcheung/rlintr.
* [SPARK-11420] Updating Stddev support via Imperative AggregateJihongMa2015-11-121-2/+2
| | | | | | | | switched stddev support from DeclarativeAggregate to ImperativeAggregate. Author: JihongMa <linlin200605@gmail.com> Closes #9380 from JihongMA/SPARK-11420.
* [SPARK-11468] [SPARKR] add stddev/variance agg functions for Columnfelixcheung2015-11-105-30/+297
| | | | | | | | | | Checked names, none of them should conflict with anything in base shivaram davies rxin Author: felixcheung <felixcheung_m@hotmail.com> Closes #9489 from felixcheung/rstddev.
* [ML][R] SparkR::glm summary result to compare with native RYanbo Liang2015-11-102-22/+11
| | | | | | | | Follow up #9561. Due to [SPARK-11587](https://issues.apache.org/jira/browse/SPARK-11587) has been fixed, we should compare SparkR::glm summary result with native R output rather than hard-code one. mengxr Author: Yanbo Liang <ybliang8@gmail.com> Closes #9590 from yanboliang/glm-r-test.
* [SPARK-10863][SPARKR] Method coltypes() (New version)Oscar D. Lara Yejas2015-11-107-18/+124
| | | | | | | | This is a follow up on PR #8984, as the corresponding branch for such PR was damaged. Author: Oscar D. Lara Yejas <olarayej@mail.usf.edu> Closes #9579 from olarayej/SPARK-10863_NEW14.
* [SPARK-9830][SQL] Remove AggregateExpression1 and Aggregate Operator used to ↵Yin Huai2015-11-101-1/+1
| | | | | | | | | | | | | | | | | | | evaluate AggregateExpression1s https://issues.apache.org/jira/browse/SPARK-9830 This PR contains the following main changes. * Removing `AggregateExpression1`. * Removing `Aggregate` operator, which is used to evaluate `AggregateExpression1`. * Removing planner rule used to plan `Aggregate`. * Linking `MultipleDistinctRewriter` to analyzer. * Renaming `AggregateExpression2` to `AggregateExpression` and `AggregateFunction2` to `AggregateFunction`. * Updating places where we create aggregate expression. The way to create aggregate expressions is `AggregateExpression(aggregateFunction, mode, isDistinct)`. * Changing `val`s in `DeclarativeAggregate`s that touch children of this function to `lazy val`s (when we create aggregate expression in DataFrame API, children of an aggregate function can be unresolved). Author: Yin Huai <yhuai@databricks.com> Closes #9556 from yhuai/removeAgg1.
* [SPARK-11587][SPARKR] Fix the summary generic to match base RShivaram Venkataraman2015-11-094-10/+16
| | | | | | | | | The signature is summary(object, ...) as defined in https://stat.ethz.ch/R-manual/R-devel/library/base/html/summary.html Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #9582 from shivaram/summary-fix.
* [SPARK-9865][SPARKR] Flaky SparkR test: test_sparkSQL.R: sample on a DataFramefelixcheung2015-11-091-2/+2
| | | | | | | | | | | | | Make sample test less flaky by setting the seed Tested with ``` repeat { if (count(sample(df, FALSE, 0.1)) == 3) { break } } ``` Author: felixcheung <felixcheung_m@hotmail.com> Closes #9549 from felixcheung/rsample.
* [SPARK-11494][ML][R] Expose R-like summary statistics in SparkR::glm for ↵Yanbo Liang2015-11-092-11/+42
| | | | | | | | | | | | | | | | | | | | | | linear regression Expose R-like summary statistics in SparkR::glm for linear regression, the output of ```summary``` like ```Java $DevianceResiduals Min Max -0.9509607 0.7291832 $Coefficients Estimate Std. Error t value Pr(>|t|) (Intercept) 1.6765 0.2353597 7.123139 4.456124e-11 Sepal_Length 0.3498801 0.04630128 7.556598 4.187317e-12 Species_versicolor -0.9833885 0.07207471 -13.64402 0 Species_virginica -1.00751 0.09330565 -10.79796 0 ``` Author: Yanbo Liang <ybliang8@gmail.com> Closes #9561 from yanboliang/spark-11494.
* [SPARK-10116][CORE] XORShiftRandom.hashSeed is random in high bitsImran Rashid2015-11-061-4/+4
| | | | | | | | | | | | https://issues.apache.org/jira/browse/SPARK-10116 This is really trivial, just happened to notice it -- if `XORShiftRandom.hashSeed` is really supposed to have random bits throughout (as the comment implies), it needs to do something for the conversion to `long`. mengxr mkolod Author: Imran Rashid <irashid@cloudera.com> Closes #8314 from squito/SPARK-10116.
* [SPARK-11542] [SPARKR] fix glm with long fomularDavies Liu2015-11-052-1/+14
| | | | | | | | Because deparse() will break the long string into multiple lines, the deserialization will fail Author: Davies Liu <davies@databricks.com> Closes #9510 from davies/fix_glm.