spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	[SPARK-13389][SPARKR] SparkR support first/last with ignore NAs	Yanbo Liang	2016-03-10	3	-10/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? SparkR support first/last with ignore NAs cc sun-rui felixcheung shivaram ## How was the this patch tested? unit tests Author: Yanbo Liang <ybliang8@gmail.com> Closes #11267 from yanboliang/spark-13389.
*	[SPARK-13327][SPARKR] Added parameter validations for colnames<-	Oscar D. Lara Yejas	2016-03-10	2	-1/+32
\| \| \| \| \| \| \|	Author: Oscar D. Lara Yejas <odlaraye@oscars-mbp.attlocal.net> Author: Oscar D. Lara Yejas <odlaraye@oscars-mbp.usca.ibm.com> Closes #11220 from olarayej/SPARK-13312-3.
*	[SPARK-13504] [SPARKR] Add approxQuantile for SparkR	Yanbo Liang	2016-02-25	4	-0/+55
\| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Add ```approxQuantile``` for SparkR. ## How was this patch tested? unit tests Author: Yanbo Liang <ybliang8@gmail.com> Closes #11383 from yanboliang/spark-13504 and squashes the following commits: 4f17adb [Yanbo Liang] Add approxQuantile for SparkR
*	[SPARK-13472] [SPARKR] Fix unstable Kmeans test in R	Liang-Chi Hsieh	2016-02-24	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	JIRA: https://issues.apache.org/jira/browse/SPARK-13472 ## What changes were proposed in this pull request? One Kmeans test in R is unstable and sometimes fails. We should fix it. ## How was this patch tested? Unit test is modified in this PR. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #11345 from viirya/fix-kmeans-r-test and squashes the following commits: f959f61 [Liang-Chi Hsieh] Sort resulted clusters.
*	[SPARK-13011] K-means wrapper in SparkR	Xusen Yin	2016-02-23	4	-5/+109
\| \| \| \| \| \| \| \|	https://issues.apache.org/jira/browse/SPARK-13011 Author: Xusen Yin <yinxusen@gmail.com> Closes #11124 from yinxusen/SPARK-13011.
*	[MINOR][DOCS] Fix all typos in markdown files of `doc` and similar patterns ↵	Dongjoon Hyun	2016-02-22	2	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	in other comments ## What changes were proposed in this pull request? This PR tries to fix all typos in all markdown files under `docs` module, and fixes similar typos in other comments, too. ## How was the this patch tested? manual tests. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11300 from dongjoon-hyun/minor_fix_typos.
*	[SPARK-12799] Simplify various string output for expressions	Cheng Lian	2016-02-21	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This PR introduces several major changes: 1. Replacing `Expression.prettyString` with `Expression.sql` The `prettyString` method is mostly an internal, developer faced facility for debugging purposes, and shouldn't be exposed to users. 1. Using SQL-like representation as column names for selected fields that are not named expression (back-ticks and double quotes should be removed) Before, we were using `prettyString` as column names when possible, and sometimes the result column names can be weird. Here are several examples: Expression \| `prettyString` \| `sql` \| Note ------------------ \| -------------- \| ---------- \| --------------- `a && b` \| `a && b` \| `a AND b` \| `a.getField("f")` \| `a[f]` \| `a.f` \| `a` is a struct 1. Adding trait `NonSQLExpression` extending from `Expression` for expressions that don't have a SQL representation (e.g. Scala UDF/UDAF and Java/Scala object expressions used for encoders) `NonSQLExpression.sql` may return an arbitrary user facing string representation of the expression. Author: Cheng Lian <lian@databricks.com> Closes #10757 from liancheng/spark-12799.simplify-expression-string-methods.
*	[SPARK-13339][DOCS] Clarify commutative / associative operator requirements ↵	Sean Owen	2016-02-19	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \|	for reduce, fold Clarify that reduce functions need to be commutative, and fold functions do not See https://github.com/apache/spark/pull/11091 Author: Sean Owen <sowen@cloudera.com> Closes #11217 from srowen/SPARK-13339.
*	[SPARK-13264][DOC] Removed multi-byte characters in spark-env.sh.template	Sasaki Toru	2016-02-11	1	-1/+1
\| \| \| \| \| \| \| \|	In spark-env.sh.template, there are multi-byte characters, this PR will remove it. Author: Sasaki Toru <sasakitoa@nttdata.co.jp> Closes #11149 from sasakitoa/remove_multibyte_in_sparkenv.
*	[SPARK-12903][SPARKR] Add covar_samp and covar_pop for SparkR	Yanbo Liang	2016-01-26	5	-2/+73
\| \| \| \| \| \| \| \| \| \| \|	Add ```covar_samp``` and ```covar_pop``` for SparkR. Should we also provide ```cov``` alias for ```covar_samp```? There is ```cov``` implementation at stats.R which masks ```stats::cov``` already, but may bring to breaking API change. cc sun-rui felixcheung shivaram Author: Yanbo Liang <ybliang8@gmail.com> Closes #10829 from yanboliang/spark-12903.
*	[SPARK-12629][SPARKR] Fixes for DataFrame saveAsTable method	Narine Kokhlikyan	2016-01-22	3	-9/+41
\| \| \| \| \| \| \| \| \| \|	I've tried to solve some of the issues mentioned in: https://issues.apache.org/jira/browse/SPARK-12629 Please, let me know what do you think. Thanks! Author: Narine Kokhlikyan <narine.kokhlikyan@gmail.com> Closes #10580 from NarineK/sparkrSavaAsRable.
*	[SPARK-12204][SPARKR] Implement drop method for DataFrame in SparkR.	Sun Rui	2016-01-20	5	-27/+88
\| \| \| \| \| \|	Author: Sun Rui <rui.sun@intel.com> Closes #10201 from sun-rui/SPARK-12204.
*	[SPARK-12910] Fixes : R version for installing sparkR	Shubhanshu Mishra	2016-01-20	2	-2/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Testing code: ``` $ ./install-dev.sh USING R_HOME = /usr/bin ERROR: this R is version 2.15.1, package 'SparkR' requires R >= 3.0 ``` Using the new argument: ``` $ ./install-dev.sh /content/username/SOFTWARE/R-3.2.3 USING R_HOME = /content/username/SOFTWARE/R-3.2.3/bin * installing source package â€˜SparkRâ€™ ... R inst preparing package for lazy loading Creating a new generic function for â€˜colnamesâ€™ in package â€˜SparkRâ€™ Creating a new generic function for â€˜colnames<-â€™ in package â€˜SparkRâ€™ Creating a new generic function for â€˜covâ€™ in package â€˜SparkRâ€™ Creating a new generic function for â€˜na.omitâ€™ in package â€˜SparkRâ€™ Creating a new generic function for â€˜filterâ€™ in package â€˜SparkRâ€™ Creating a new generic function for â€˜intersectâ€™ in package â€˜SparkRâ€™ Creating a new generic function for â€˜sampleâ€™ in package â€˜SparkRâ€™ Creating a new generic function for â€˜transformâ€™ in package â€˜SparkRâ€™ Creating a new generic function for â€˜subsetâ€™ in package â€˜SparkRâ€™ Creating a new generic function for â€˜summaryâ€™ in package â€˜SparkRâ€™ Creating a new generic function for â€˜lagâ€™ in package â€˜SparkRâ€™ Creating a new generic function for â€˜rankâ€™ in package â€˜SparkRâ€™ Creating a new generic function for â€˜sdâ€™ in package â€˜SparkRâ€™ Creating a new generic function for â€˜varâ€™ in package â€˜SparkRâ€™ Creating a new generic function for â€˜predictâ€™ in package â€˜SparkRâ€™ Creating a new generic function for â€˜rbindâ€™ in package â€˜SparkRâ€™ Creating a generic function for â€˜lapplyâ€™ from package â€˜baseâ€™ in package â€˜SparkRâ€™ Creating a generic function for â€˜Filterâ€™ from package â€˜baseâ€™ in package â€˜SparkRâ€™ Creating a generic function for â€˜aliasâ€™ from package â€˜statsâ€™ in package â€˜SparkRâ€™ Creating a generic function for â€˜substrâ€™ from package â€˜baseâ€™ in package â€˜SparkRâ€™ Creating a generic function for â€˜%in%â€™ from package â€˜baseâ€™ in package â€˜SparkRâ€™ Creating a generic function for â€˜meanâ€™ from package â€˜baseâ€™ in package â€˜SparkRâ€™ Creating a generic function for â€˜uniqueâ€™ from package â€˜baseâ€™ in package â€˜SparkRâ€™ Creating a generic function for â€˜nrowâ€™ from package â€˜baseâ€™ in package â€˜SparkRâ€™ Creating a generic function for â€˜ncolâ€™ from package â€˜baseâ€™ in package â€˜SparkRâ€™ Creating a generic function for â€˜headâ€™ from package â€˜utilsâ€™ in package â€˜SparkRâ€™ Creating a generic function for â€˜factorialâ€™ from package â€˜baseâ€™ in package â€˜SparkRâ€™ Creating a generic function for â€˜atan2â€™ from package â€˜baseâ€™ in package â€˜SparkRâ€™ Creating a generic function for â€˜ifelseâ€™ from package â€˜baseâ€™ in package â€˜SparkRâ€™ help No man pages found in package â€˜SparkRâ€™ * installing help indices building package indices ** testing if installed package can be loaded * DONE (SparkR) ``` Author: Shubhanshu Mishra <smishra8@illinois.edu> Closes #10836 from napsternxg/master.
*	[SPARK-12848][SQL] Change parsed decimal literal datatype from Double to Decimal	Herman van Hovell	2016-01-20	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current parser turns a decimal literal, for example ```12.1```, into a Double. The problem with this approach is that we convert an exact literal into a non-exact ```Double```. The PR changes this behavior, a Decimal literal is now converted into an extact ```BigDecimal```. The behavior for scientific decimals, for example ```12.1e01```, is unchanged. This will be converted into a Double. This PR replaces the ```BigDecimal``` literal by a ```Double``` literal, because the ```BigDecimal``` is the default now. You can use the double literal by appending a 'D' to the value, for instance: ```3.141527D``` cc davies rxin Author: Herman van Hovell <hvanhovell@questtec.nl> Closes #10796 from hvanhovell/SPARK-12848.
*	[SPARK-12232][SPARKR] New R API for read.table to avoid name conflict	felixcheung	2016-01-19	4	-20/+17
\| \| \| \| \| \| \| \|	shivaram sorry it took longer to fix some conflicts, this is the change to add an alias for `table` Author: felixcheung <felixcheung_m@hotmail.com> Closes #10406 from felixcheung/readtable.
*	[SPARK-12337][SPARKR] Implement dropDuplicates() method of DataFrame in SparkR.	Sun Rui	2016-01-19	4	-1/+75
\| \| \| \| \| \|	Author: Sun Rui <rui.sun@intel.com> Closes #10309 from sun-rui/SPARK-12337.
*	[SPARK-12168][SPARKR] Add automated tests for conflicted function in R	felixcheung	2016-01-19	2	-1/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently this is reported when loading the SparkR package in R (probably would add is.nan) ``` Loading required package: methods Attaching package: ‘SparkR’ The following objects are masked from ‘package:stats’: cov, filter, lag, na.omit, predict, sd, var The following objects are masked from ‘package:base’: colnames, colnames<-, intersect, rank, rbind, sample, subset, summary, table, transform ``` Adding this test adds an automated way to track changes to masked method. Also, the second part of this test check for those functions that would not be accessible without namespace/package prefix. Incidentally, this might point to how we would fix those inaccessible functions in base or stats. Looking for feedback for adding this test. Author: felixcheung <felixcheung_m@hotmail.com> Closes #10171 from felixcheung/rmaskedtest.
*	[SPARK-12862][SPARKR] Jenkins does not run R tests	felixcheung	2016-01-17	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Slight correction: I'm leaving sparkR as-is (ie. R file not supported) and fixed only run-tests.sh as shivaram described. I also assume we are going to cover all doc changes in https://issues.apache.org/jira/browse/SPARK-12846 instead of here. rxin shivaram zjffdu Author: felixcheung <felixcheung_m@hotmail.com> Closes #10792 from felixcheung/sparkRcmd.
*	[SPARK-11031][SPARKR] Method str() on a DataFrame	Oscar D. Lara Yejas	2016-01-15	5	-22/+140
\| \| \| \| \| \| \| \| \|	Author: Oscar D. Lara Yejas <odlaraye@oscars-mbp.usca.ibm.com> Author: Oscar D. Lara Yejas <olarayej@mail.usf.edu> Author: Oscar D. Lara Yejas <oscar.lara.yejas@us.ibm.com> Author: Oscar D. Lara Yejas <odlaraye@oscars-mbp.attlocal.net> Closes #9613 from olarayej/SPARK-11031.
*	[SPARK-12756][SQL] use hash expression in Exchange	Wenchen Fan	2016-01-13	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	This PR makes bucketing and exchange share one common hash algorithm, so that we can guarantee the data distribution is same between shuffle and bucketed data source, which enables us to only shuffle one side when join a bucketed table and a normal one. This PR also fixes the tests that are broken by the new hash behaviour in shuffle. Author: Wenchen Fan <wenchen@databricks.com> Closes #10703 from cloud-fan/use-hash-expr-in-shuffle.
*	[SPARK-12645][SPARKR] SparkR support hash function	Yanbo Liang	2016-01-09	4	-1/+26
\| \| \| \| \| \| \| \|	Add ```hash``` function for SparkR ```DataFrame```. Author: Yanbo Liang <ybliang8@gmail.com> Closes #10597 from yanboliang/spark-12645.
*	[SPARK-12393][SPARKR] Add read.text and write.text for SparkR	Yanbo Liang	2016-01-06	5	-1/+82
\| \| \| \| \| \| \| \| \|	Add ```read.text``` and ```write.text``` for SparkR. cc sun-rui felixcheung shivaram Author: Yanbo Liang <ybliang8@gmail.com> Closes #10348 from yanboliang/spark-12393.
*	[SPARK-12625][SPARKR][SQL] replace R usage of Spark SQL deprecated API	felixcheung	2016-01-04	5	-25/+33
\| \| \| \| \| \| \| \| \| \| \|	rxin davies shivaram Took save mode from my PR #10480, and move everything to writer methods. This is related to PR #10559 - [x] it seems jsonRDD() is broken, need to investigate - this is not a public API though; will look into some more tonight. (fixed) Author: felixcheung <felixcheung_m@hotmail.com> Closes #10584 from felixcheung/rremovedeprecated.
*	[SPARK-12327][SPARKR] fix code for lintr warning for commented code	felixcheung	2016-01-03	9	-11/+88
\| \| \| \| \| \| \| \|	shivaram Author: felixcheung <felixcheung_m@hotmail.com> Closes #10408 from felixcheung/rcodecomment.
*	[SPARK-11199][SPARKR] Improve R context management story and add getOrCreate	Hossein	2015-12-29	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \|	* Changes api.r.SQLUtils to use ```SQLContext.getOrCreate``` instead of creating a new context. * Adds a simple test [SPARK-11199] #comment link with JIRA Author: Hossein <hossein@databricks.com> Closes #9185 from falaki/SPARK-11199.
*	[SPARK-12526][SPARKR] ifelse`, `when`, `otherwise` unable to take Column as ↵	Forest Fang	2015-12-29	3	-7/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	value `ifelse`, `when`, `otherwise` is unable to take `Column` typed S4 object as values. For example: ```r ifelse(lit(1) == lit(1), lit(2), lit(3)) ifelse(df$mpg > 0, df$mpg, 0) ``` will both fail with ```r attempt to replicate an object of type 'environment' ``` The PR replaces `ifelse` calls with `if ... else ...` inside the function implementations to avoid attempt to vectorize(i.e. `rep()`). It remains to be discussed whether we should instead support vectorization in these functions for consistency because `ifelse` in base R is vectorized but I cannot foresee any scenarios these functions will want to be vectorized in SparkR. For reference, added test cases which trigger failures: ```r . Error: when(), otherwise() and ifelse() with column on a DataFrame ---------- error in evaluating the argument 'x' in selecting a method for function 'collect': error in evaluating the argument 'col' in selecting a method for function 'select': attempt to replicate an object of type 'environment' Calls: when -> when -> ifelse -> ifelse 1: withCallingHandlers(eval(code, new_test_environment), error = capture_calls, message = function(c) invokeRestart("muffleMessage")) 2: eval(code, new_test_environment) 3: eval(expr, envir, enclos) 4: expect_equal(collect(select(df, when(df$a > 1 & df$b > 2, lit(1))))[, 1], c(NA, 1)) at test_sparkSQL.R:1126 5: expect_that(object, equals(expected, label = expected.label, ...), info = info, label = label) 6: condition(object) 7: compare(actual, expected, ...) 8: collect(select(df, when(df$a > 1 & df$b > 2, lit(1)))) Error: Test failures Execution halted ``` Author: Forest Fang <forest.fang@outlook.com> Closes #10481 from saurfang/spark-12526.
*	Bump master version to 2.0.0-SNAPSHOT.	Reynold Xin	2015-12-19	1	-1/+1
\| \| \| \| \| \|	Author: Reynold Xin <rxin@databricks.com> Closes #10387 from rxin/version-bump.
*	[SPARK-12310][SPARKR] Add write.json and write.parquet for SparkR	Yanbo Liang	2015-12-16	4	-56/+119
\| \| \| \| \| \| \| \|	Add ```write.json``` and ```write.parquet``` for SparkR, and deprecated ```saveAsParquetFile```. Author: Yanbo Liang <ybliang8@gmail.com> Closes #10281 from yanboliang/spark-12310.
*	[SPARK-12318][SPARKR] Save mode in SparkR should be error by default	Jeff Zhang	2015-12-16	1	-5/+5
\| \| \| \| \| \| \| \|	shivaram Please help review. Author: Jeff Zhang <zjffdu@apache.org> Closes #10290 from zjffdu/SPARK-12318.
*	[SPARK-12327] Disable commented code lintr temporarily	Shivaram Venkataraman	2015-12-14	1	-1/+1
\| \| \| \| \| \| \| \|	cc yhuai felixcheung shaneknapp Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #10300 from shivaram/comment-lintr-disable.
*	[SPARK-12158][SPARKR][SQL] Fix 'sample' functions that break R unit test cases	gatorsmile	2015-12-11	2	-6/+15
\| \| \| \| \| \| \| \| \| \| \|	The existing sample functions miss the parameter `seed`, however, the corresponding function interface in `generics` has such a parameter. Thus, although the function caller can call the function with the 'seed', we are not using the value. This could cause SparkR unit tests failed. For example, I hit it in another PR: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47213/consoleFull Author: gatorsmile <gatorsmile@gmail.com> Closes #10160 from gatorsmile/sampleR.
*	[SPARK-12146][SPARKR] SparkR jsonFile should support multiple input files	Yanbo Liang	2015-12-11	4	-115/+137
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* ```jsonFile``` should support multiple input files, such as: ```R jsonFile(sqlContext, c(“path1”, “path2”)) # character vector as arguments jsonFile(sqlContext, “path1,path2”) ``` * Meanwhile, ```jsonFile``` has been deprecated by Spark SQL and will be removed at Spark 2.0. So we mark ```jsonFile``` deprecated and use ```read.json``` at SparkR side. * Replace all ```jsonFile``` with ```read.json``` at test_sparkSQL.R, but still keep jsonFile test case. * If this PR is accepted, we should also make almost the same change for ```parquetFile```. cc felixcheung sun-rui shivaram Author: Yanbo Liang <ybliang8@gmail.com> Closes #10145 from yanboliang/spark-12146.
*	[SPARK-12234][SPARKR] Fix ```subset``` function error when only set ↵	Yanbo Liang	2015-12-10	2	-2/+11
\| \| \| \| \| \| \| \| \| \| \| \|	```select``` argument Fix ```subset``` function error when only set ```select``` argument. Please refer to the [JIRA](https://issues.apache.org/jira/browse/SPARK-12234) about the error and how to reproduce it. cc sun-rui felixcheung shivaram Author: Yanbo Liang <ybliang8@gmail.com> Closes #10217 from yanboliang/spark-12234.
*	[SPARK-12198][SPARKR] SparkR support read.parquet and deprecate parquetFile	Yanbo Liang	2015-12-10	3	-6/+22
\| \| \| \| \| \| \| \|	SparkR support ```read.parquet``` and deprecate ```parquetFile```. This change is similar with #10145 for ```jsonFile```. Author: Yanbo Liang <ybliang8@gmail.com> Closes #10191 from yanboliang/spark-12198.
*	[SPARK-12034][SPARKR] Eliminate warnings in SparkR test cases.	Sun Rui	2015-12-07	20	-39/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This PR: 1. Suppress all known warnings. 2. Cleanup test cases and fix some errors in test cases. 3. Fix errors in HiveContext related test cases. These test cases are actually not run previously due to a bug of creating TestHiveContext. 4. Support 'testthat' package version 0.11.0 which prefers that test cases be under 'tests/testthat' 5. Make sure the default Hadoop file system is local when running test cases. 6. Turn on warnings into errors. Author: Sun Rui <rui.sun@intel.com> Closes #10030 from sun-rui/SPARK-12034.
*	[SPARK-12044][SPARKR] Fix usage of isnan, isNaN	Yanbo Liang	2015-12-05	4	-11/+31
\| \| \| \| \| \| \| \| \| \| \| \|	1, Add ```isNaN``` to ```Column``` for SparkR. ```Column``` should has three related variable functions: ```isNaN, isNull, isNotNull```. 2, Replace ```DataFrame.isNaN``` with ```DataFrame.isnan``` at SparkR side. Because ```DataFrame.isNaN``` has been deprecated and will be removed at Spark 2.0. <del>3, Add ```isnull``` to ```DataFrame``` for SparkR. ```DataFrame``` should has two related functions: ```isnan, isnull```.<del> cc shivaram sun-rui felixcheung Author: Yanbo Liang <ybliang8@gmail.com> Closes #10037 from yanboliang/spark-12044.
*	[SPARK-12115][SPARKR] Change numPartitions() to getNumPartitions() to be ↵	Yanbo Liang	2015-12-05	4	-30/+45
\| \| \| \| \| \| \| \| \| \| \| \| \|	consistent with Scala/Python Change ```numPartitions()``` to ```getNumPartitions()``` to be consistent with Scala/Python. <del>Note: If we can not catch up with 1.6 release, it will be breaking change for 1.7 that we also need to explain in release note.<del> cc sun-rui felixcheung shivaram Author: Yanbo Liang <ybliang8@gmail.com> Closes #10123 from yanboliang/spark-12115.
*	[SPARK-11715][SPARKR] Add R support corr for Column Aggregration	felixcheung	2015-12-05	4	-6/+22
\| \| \| \| \| \| \| \|	Need to match existing method signature Author: felixcheung <felixcheung_m@hotmail.com> Closes #9680 from felixcheung/rcorr.
*	[SPARK-11774][SPARKR] Implement struct(), encode(), decode() functions in ↵	Sun Rui	2015-12-05	4	-6/+105
\| \| \| \| \| \| \| \|	SparkR. Author: Sun Rui <rui.sun@intel.com> Closes #9804 from sun-rui/SPARK-11774.
*	[SPARK-12104][SPARKR] collect() does not handle multiple columns with same name.	Sun Rui	2015-12-03	2	-4/+10
\| \| \| \| \| \|	Author: Sun Rui <rui.sun@intel.com> Closes #10118 from sun-rui/SPARK-12104.
*	[SPARK-12019][SPARKR] Support character vector for sparkR.init(), check ↵	felixcheung	2015-12-03	5	-21/+79
\| \| \| \| \| \| \| \| \| \| \|	param and fix doc and add tests. Spark submit expects comma-separated list Author: felixcheung <felixcheung_m@hotmail.com> Closes #10034 from felixcheung/sparkrinitdoc.
*	[SPARK-11781][SPARKR] SparkR has problem in inferring type of raw type.	Sun Rui	2015-11-29	4	-32/+47
\| \| \| \| \| \|	Author: Sun Rui <rui.sun@intel.com> Closes #9769 from sun-rui/SPARK-11781.
*	[SPARK-9319][SPARKR] Add support for setting column names, types	felixcheung	2015-11-28	5	-55/+185
\| \| \| \| \| \| \| \| \| \| \| \| \|	Add support for for colnames, colnames<-, coltypes<- Also added tests for names, names<- which have no test previously. I merged with PR 8984 (coltypes). Clicked the wrong thing, crewed up the PR. Recreated it here. Was #9218 shivaram sun-rui Author: felixcheung <felixcheung_m@hotmail.com> Closes #9654 from felixcheung/colnamescoltypes.
*	[SPARK-12029][SPARKR] Improve column functions signature, param check, ↵	felixcheung	2015-11-28	2	-34/+96
\| \| \| \| \| \| \| \| \| \|	tests, fix doc and add examples shivaram sun-rui Author: felixcheung <felixcheung_m@hotmail.com> Closes #10019 from felixcheung/rfunctionsdoc.
*	[SPARK-12025][SPARKR] Rename some window rank function names for SparkR	Yanbo Liang	2015-11-27	4	-41/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change ```cumeDist -> cume_dist, denseRank -> dense_rank, percentRank -> percent_rank, rowNumber -> row_number``` at SparkR side. There are two reasons that we should make this change: * We should follow the [naming convention rule of R](http://www.inside-r.org/node/230645) * Spark DataFrame has deprecated the old convention (such as ```cumeDist```) and will remove it in Spark 2.0. It's better to fix this issue before 1.6 release, otherwise we will make breaking API change. cc shivaram sun-rui Author: Yanbo Liang <ybliang8@gmail.com> Closes #10016 from yanboliang/SPARK-12025.
*	[SPARK-11756][SPARKR] Fix use of aliases - SparkR can not output help ↵	felixcheung	2015-11-20	4	-84/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	information for SparkR:::summary correctly Fix use of aliases and changes uses of rdname and seealso `aliases` is the hint for `?` - it should not be linked to some other name - those should be seealso https://cran.r-project.org/web/packages/roxygen2/vignettes/rd.html Clean up usage on family, as multiple use of family with the same rdname is causing duplicated See Also html blocks (like http://spark.apache.org/docs/latest/api/R/count.html) Also changing some rdname for dplyr-like variant for better R user visibility in R doc, eg. rbind, summary, mutate, summarize shivaram yanboliang Author: felixcheung <felixcheung_m@hotmail.com> Closes #9750 from felixcheung/rdocaliases.
*	[SPARK-11339][SPARKR] Document the list of functions in R base package that ↵	felixcheung	2015-11-18	5	-5/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	are masked by functions with same name in SparkR Added tests for function that are reported as masked, to make sure the base:: or stats:: function can be called. For those we can't call, added them to SparkR programming guide. It would seem to me `table, sample, subset, filter, cov` not working are not actually expected - I investigated/experimented with them but couldn't get them to work. It looks like as they are defined in base or stats they are missing the S3 generic, eg. ``` > methods("transform") [1] transform,ANY-method transform.data.frame [3] transform,DataFrame-method transform.default see '?methods' for accessing help and source code > methods("subset") [1] subset.data.frame subset,DataFrame-method subset.default [4] subset.matrix see '?methods' for accessing help and source code Warning message: In .S3methods(generic.function, class, parent.frame()) : function 'subset' appears not to be S3 generic; found functions that look like S3 methods ``` Any idea? More information on masking: http://www.ats.ucla.edu/stat/r/faq/referencing_objects.htm http://www.sfu.ca/~sweldon/howTo/guide4.pdf This is what the output doc looks like (minus css): ![image](https://cloud.githubusercontent.com/assets/8969467/11229714/2946e5de-8d4d-11e5-94b0-dda9696b6fdd.png) Author: felixcheung <felixcheung_m@hotmail.com> Closes #9785 from felixcheung/rmasked.
*	[SPARK-11684][R][ML][DOC] Update SparkR glm API doc, user guide and example ↵	Yanbo Liang	2015-11-18	1	-3/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	codes This PR includes: * Update SparkR:::glm, SparkR:::summary API docs. * Update SparkR machine learning user guide and example codes to show: * supporting feature interaction in R formula. * summary for gaussian GLM model. * coefficients for binomial GLM model. mengxr Author: Yanbo Liang <ybliang8@gmail.com> Closes #9727 from yanboliang/spark-11684.
*	[SPARK-11773][SPARKR] Implement collection functions in SparkR.	Sun Rui	2015-11-18	6	-35/+100
\| \| \| \| \| \|	Author: Sun Rui <rui.sun@intel.com> Closes #9764 from sun-rui/SPARK-11773.
*	[SPARK-11281][SPARKR] Add tests covering the issue.	zero323	2015-11-18	1	-3/+7
\| \| \| \| \| \| \| \|	The goal of this PR is to add tests covering the issue to ensure that is was resolved by [SPARK-11086](https://issues.apache.org/jira/browse/SPARK-11086). Author: zero323 <matthew.szymkiewicz@gmail.com> Closes #9743 from zero323/SPARK-11281-tests.