spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	[SPARK-12756][SQL] use hash expression in Exchange	Wenchen Fan	2016-01-13	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	This PR makes bucketing and exchange share one common hash algorithm, so that we can guarantee the data distribution is same between shuffle and bucketed data source, which enables us to only shuffle one side when join a bucketed table and a normal one. This PR also fixes the tests that are broken by the new hash behaviour in shuffle. Author: Wenchen Fan <wenchen@databricks.com> Closes #10703 from cloud-fan/use-hash-expr-in-shuffle.
*	[SPARK-12645][SPARKR] SparkR support hash function	Yanbo Liang	2016-01-09	1	-1/+1
\| \| \| \| \| \| \| \|	Add ```hash``` function for SparkR ```DataFrame```. Author: Yanbo Liang <ybliang8@gmail.com> Closes #10597 from yanboliang/spark-12645.
*	[SPARK-12393][SPARKR] Add read.text and write.text for SparkR	Yanbo Liang	2016-01-06	1	-0/+21
\| \| \| \| \| \| \| \| \|	Add ```read.text``` and ```write.text``` for SparkR. cc sun-rui felixcheung shivaram Author: Yanbo Liang <ybliang8@gmail.com> Closes #10348 from yanboliang/spark-12393.
*	[SPARK-12625][SPARKR][SQL] replace R usage of Spark SQL deprecated API	felixcheung	2016-01-04	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	rxin davies shivaram Took save mode from my PR #10480, and move everything to writer methods. This is related to PR #10559 - [x] it seems jsonRDD() is broken, need to investigate - this is not a public API though; will look into some more tonight. (fixed) Author: felixcheung <felixcheung_m@hotmail.com> Closes #10584 from felixcheung/rremovedeprecated.
*	[SPARK-12327][SPARKR] fix code for lintr warning for commented code	felixcheung	2016-01-03	4	-8/+14
\| \| \| \| \| \| \| \|	shivaram Author: felixcheung <felixcheung_m@hotmail.com> Closes #10408 from felixcheung/rcodecomment.
*	[SPARK-11199][SPARKR] Improve R context management story and add getOrCreate	Hossein	2015-12-29	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \|	* Changes api.r.SQLUtils to use ```SQLContext.getOrCreate``` instead of creating a new context. * Adds a simple test [SPARK-11199] #comment link with JIRA Author: Hossein <hossein@databricks.com> Closes #9185 from falaki/SPARK-11199.
*	[SPARK-12526][SPARKR] ifelse`, `when`, `otherwise` unable to take Column as ↵	Forest Fang	2015-12-29	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	value `ifelse`, `when`, `otherwise` is unable to take `Column` typed S4 object as values. For example: ```r ifelse(lit(1) == lit(1), lit(2), lit(3)) ifelse(df$mpg > 0, df$mpg, 0) ``` will both fail with ```r attempt to replicate an object of type 'environment' ``` The PR replaces `ifelse` calls with `if ... else ...` inside the function implementations to avoid attempt to vectorize(i.e. `rep()`). It remains to be discussed whether we should instead support vectorization in these functions for consistency because `ifelse` in base R is vectorized but I cannot foresee any scenarios these functions will want to be vectorized in SparkR. For reference, added test cases which trigger failures: ```r . Error: when(), otherwise() and ifelse() with column on a DataFrame ---------- error in evaluating the argument 'x' in selecting a method for function 'collect': error in evaluating the argument 'col' in selecting a method for function 'select': attempt to replicate an object of type 'environment' Calls: when -> when -> ifelse -> ifelse 1: withCallingHandlers(eval(code, new_test_environment), error = capture_calls, message = function(c) invokeRestart("muffleMessage")) 2: eval(code, new_test_environment) 3: eval(expr, envir, enclos) 4: expect_equal(collect(select(df, when(df$a > 1 & df$b > 2, lit(1))))[, 1], c(NA, 1)) at test_sparkSQL.R:1126 5: expect_that(object, equals(expected, label = expected.label, ...), info = info, label = label) 6: condition(object) 7: compare(actual, expected, ...) 8: collect(select(df, when(df$a > 1 & df$b > 2, lit(1)))) Error: Test failures Execution halted ``` Author: Forest Fang <forest.fang@outlook.com> Closes #10481 from saurfang/spark-12526.
*	[SPARK-12310][SPARKR] Add write.json and write.parquet for SparkR	Yanbo Liang	2015-12-16	1	-45/+59
\| \| \| \| \| \| \| \|	Add ```write.json``` and ```write.parquet``` for SparkR, and deprecated ```saveAsParquetFile```. Author: Yanbo Liang <ybliang8@gmail.com> Closes #10281 from yanboliang/spark-12310.
*	[SPARK-12158][SPARKR][SQL] Fix 'sample' functions that break R unit test cases	gatorsmile	2015-12-11	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \|	The existing sample functions miss the parameter `seed`, however, the corresponding function interface in `generics` has such a parameter. Thus, although the function caller can call the function with the 'seed', we are not using the value. This could cause SparkR unit tests failed. For example, I hit it in another PR: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47213/consoleFull Author: gatorsmile <gatorsmile@gmail.com> Closes #10160 from gatorsmile/sampleR.
*	[SPARK-12146][SPARKR] SparkR jsonFile should support multiple input files	Yanbo Liang	2015-12-11	1	-54/+66
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* ```jsonFile``` should support multiple input files, such as: ```R jsonFile(sqlContext, c(“path1”, “path2”)) # character vector as arguments jsonFile(sqlContext, “path1,path2”) ``` * Meanwhile, ```jsonFile``` has been deprecated by Spark SQL and will be removed at Spark 2.0. So we mark ```jsonFile``` deprecated and use ```read.json``` at SparkR side. * Replace all ```jsonFile``` with ```read.json``` at test_sparkSQL.R, but still keep jsonFile test case. * If this PR is accepted, we should also make almost the same change for ```parquetFile```. cc felixcheung sun-rui shivaram Author: Yanbo Liang <ybliang8@gmail.com> Closes #10145 from yanboliang/spark-12146.
*	[SPARK-12234][SPARKR] Fix ```subset``` function error when only set ↵	Yanbo Liang	2015-12-10	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \|	```select``` argument Fix ```subset``` function error when only set ```select``` argument. Please refer to the [JIRA](https://issues.apache.org/jira/browse/SPARK-12234) about the error and how to reproduce it. cc sun-rui felixcheung shivaram Author: Yanbo Liang <ybliang8@gmail.com> Closes #10217 from yanboliang/spark-12234.
*	[SPARK-12198][SPARKR] SparkR support read.parquet and deprecate parquetFile	Yanbo Liang	2015-12-10	1	-4/+7
\| \| \| \| \| \| \| \|	SparkR support ```read.parquet``` and deprecate ```parquetFile```. This change is similar with #10145 for ```jsonFile```. Author: Yanbo Liang <ybliang8@gmail.com> Closes #10191 from yanboliang/spark-12198.
*	[SPARK-12034][SPARKR] Eliminate warnings in SparkR test cases.	Sun Rui	2015-12-07	18	-38/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This PR: 1. Suppress all known warnings. 2. Cleanup test cases and fix some errors in test cases. 3. Fix errors in HiveContext related test cases. These test cases are actually not run previously due to a bug of creating TestHiveContext. 4. Support 'testthat' package version 0.11.0 which prefers that test cases be under 'tests/testthat' 5. Make sure the default Hadoop file system is local when running test cases. 6. Turn on warnings into errors. Author: Sun Rui <rui.sun@intel.com> Closes #10030 from sun-rui/SPARK-12034.
*	[SPARK-12044][SPARKR] Fix usage of isnan, isNaN	Yanbo Liang	2015-12-05	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \|	1, Add ```isNaN``` to ```Column``` for SparkR. ```Column``` should has three related variable functions: ```isNaN, isNull, isNotNull```. 2, Replace ```DataFrame.isNaN``` with ```DataFrame.isnan``` at SparkR side. Because ```DataFrame.isNaN``` has been deprecated and will be removed at Spark 2.0. <del>3, Add ```isnull``` to ```DataFrame``` for SparkR. ```DataFrame``` should has two related functions: ```isnan, isnull```.<del> cc shivaram sun-rui felixcheung Author: Yanbo Liang <ybliang8@gmail.com> Closes #10037 from yanboliang/spark-12044.
*	[SPARK-12115][SPARKR] Change numPartitions() to getNumPartitions() to be ↵	Yanbo Liang	2015-12-05	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \|	consistent with Scala/Python Change ```numPartitions()``` to ```getNumPartitions()``` to be consistent with Scala/Python. <del>Note: If we can not catch up with 1.6 release, it will be breaking change for 1.7 that we also need to explain in release note.<del> cc sun-rui felixcheung shivaram Author: Yanbo Liang <ybliang8@gmail.com> Closes #10123 from yanboliang/spark-12115.
*	[SPARK-11715][SPARKR] Add R support corr for Column Aggregration	felixcheung	2015-12-05	1	-1/+1
\| \| \| \| \| \| \| \|	Need to match existing method signature Author: felixcheung <felixcheung_m@hotmail.com> Closes #9680 from felixcheung/rcorr.
*	[SPARK-11774][SPARKR] Implement struct(), encode(), decode() functions in ↵	Sun Rui	2015-12-05	1	-6/+31
\| \| \| \| \| \| \| \|	SparkR. Author: Sun Rui <rui.sun@intel.com> Closes #9804 from sun-rui/SPARK-11774.
*	[SPARK-12104][SPARKR] collect() does not handle multiple columns with same name.	Sun Rui	2015-12-03	1	-0/+6
\| \| \| \| \| \|	Author: Sun Rui <rui.sun@intel.com> Closes #10118 from sun-rui/SPARK-12104.
*	[SPARK-12019][SPARKR] Support character vector for sparkR.init(), check ↵	felixcheung	2015-12-03	2	-0/+29
\| \| \| \| \| \| \| \| \| \| \|	param and fix doc and add tests. Spark submit expects comma-separated list Author: felixcheung <felixcheung_m@hotmail.com> Closes #10034 from felixcheung/sparkrinitdoc.
*	[SPARK-11781][SPARKR] SparkR has problem in inferring type of raw type.	Sun Rui	2015-11-29	1	-0/+6
\| \| \| \| \| \|	Author: Sun Rui <rui.sun@intel.com> Closes #9769 from sun-rui/SPARK-11781.
*	[SPARK-9319][SPARKR] Add support for setting column names, types	felixcheung	2015-11-28	1	-1/+39
\| \| \| \| \| \| \| \| \| \| \| \| \|	Add support for for colnames, colnames<-, coltypes<- Also added tests for names, names<- which have no test previously. I merged with PR 8984 (coltypes). Clicked the wrong thing, crewed up the PR. Recreated it here. Was #9218 shivaram sun-rui Author: felixcheung <felixcheung_m@hotmail.com> Closes #9654 from felixcheung/colnamescoltypes.
*	[SPARK-12029][SPARKR] Improve column functions signature, param check, ↵	felixcheung	2015-11-28	1	-4/+5
\| \| \| \| \| \| \| \| \| \|	tests, fix doc and add examples shivaram sun-rui Author: felixcheung <felixcheung_m@hotmail.com> Closes #10019 from felixcheung/rfunctionsdoc.
*	[SPARK-12025][SPARKR] Rename some window rank function names for SparkR	Yanbo Liang	2015-11-27	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change ```cumeDist -> cume_dist, denseRank -> dense_rank, percentRank -> percent_rank, rowNumber -> row_number``` at SparkR side. There are two reasons that we should make this change: * We should follow the [naming convention rule of R](http://www.inside-r.org/node/230645) * Spark DataFrame has deprecated the old convention (such as ```cumeDist```) and will remove it in Spark 2.0. It's better to fix this issue before 1.6 release, otherwise we will make breaking API change. cc shivaram sun-rui Author: Yanbo Liang <ybliang8@gmail.com> Closes #10016 from yanboliang/SPARK-12025.
*	[SPARK-11339][SPARKR] Document the list of functions in R base package that ↵	felixcheung	2015-11-18	2	-1/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	are masked by functions with same name in SparkR Added tests for function that are reported as masked, to make sure the base:: or stats:: function can be called. For those we can't call, added them to SparkR programming guide. It would seem to me `table, sample, subset, filter, cov` not working are not actually expected - I investigated/experimented with them but couldn't get them to work. It looks like as they are defined in base or stats they are missing the S3 generic, eg. ``` > methods("transform") [1] transform,ANY-method transform.data.frame [3] transform,DataFrame-method transform.default see '?methods' for accessing help and source code > methods("subset") [1] subset.data.frame subset,DataFrame-method subset.default [4] subset.matrix see '?methods' for accessing help and source code Warning message: In .S3methods(generic.function, class, parent.frame()) : function 'subset' appears not to be S3 generic; found functions that look like S3 methods ``` Any idea? More information on masking: http://www.ats.ucla.edu/stat/r/faq/referencing_objects.htm http://www.sfu.ca/~sweldon/howTo/guide4.pdf This is what the output doc looks like (minus css): ![image](https://cloud.githubusercontent.com/assets/8969467/11229714/2946e5de-8d4d-11e5-94b0-dda9696b6fdd.png) Author: felixcheung <felixcheung_m@hotmail.com> Closes #9785 from felixcheung/rmasked.
*	[SPARK-11773][SPARKR] Implement collection functions in SparkR.	Sun Rui	2015-11-18	1	-0/+10
\| \| \| \| \| \|	Author: Sun Rui <rui.sun@intel.com> Closes #9764 from sun-rui/SPARK-11773.
*	[SPARK-11281][SPARKR] Add tests covering the issue.	zero323	2015-11-18	1	-3/+7
\| \| \| \| \| \| \| \|	The goal of this PR is to add tests covering the issue to ensure that is was resolved by [SPARK-11086](https://issues.apache.org/jira/browse/SPARK-11086). Author: zero323 <matthew.szymkiewicz@gmail.com> Closes #9743 from zero323/SPARK-11281-tests.
*	[SPARK-11086][SPARKR] Use dropFactors column-wise instead of nested loop ↵	zero323	2015-11-15	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	when createDataFrame Use `dropFactors` column-wise instead of nested loop when `createDataFrame` from a `data.frame` At this moment SparkR createDataFrame is using nested loop to convert factors to character when called on a local data.frame. It works but is incredibly slow especially with data.table (~ 2 orders of magnitude compared to PySpark / Pandas version on a DateFrame of size 1M rows x 2 columns). A simple improvement is to apply `dropFactor `column-wise and then reshape output list. It should at least partially address [SPARK-8277](https://issues.apache.org/jira/browse/SPARK-8277). Author: zero323 <matthew.szymkiewicz@gmail.com> Closes #9099 from zero323/SPARK-11086.
*	[SPARK-11420] Updating Stddev support via Imperative Aggregate	JihongMa	2015-11-12	1	-2/+2
\| \| \| \| \| \| \| \|	switched stddev support from DeclarativeAggregate to ImperativeAggregate. Author: JihongMa <linlin200605@gmail.com> Closes #9380 from JihongMA/SPARK-11420.
*	[SPARK-11468] [SPARKR] add stddev/variance agg functions for Column	felixcheung	2015-11-10	1	-16/+67
\| \| \| \| \| \| \| \| \| \|	Checked names, none of them should conflict with anything in base shivaram davies rxin Author: felixcheung <felixcheung_m@hotmail.com> Closes #9489 from felixcheung/rstddev.
*	[ML][R] SparkR::glm summary result to compare with native R	Yanbo Liang	2015-11-10	1	-21/+10
\| \| \| \| \| \| \| \|	Follow up #9561. Due to [SPARK-11587](https://issues.apache.org/jira/browse/SPARK-11587) has been fixed, we should compare SparkR::glm summary result with native R output rather than hard-code one. mengxr Author: Yanbo Liang <ybliang8@gmail.com> Closes #9590 from yanboliang/glm-r-test.
*	[SPARK-10863][SPARKR] Method coltypes() (New version)	Oscar D. Lara Yejas	2015-11-10	1	-1/+23
\| \| \| \| \| \| \| \|	This is a follow up on PR #8984, as the corresponding branch for such PR was damaged. Author: Oscar D. Lara Yejas <olarayej@mail.usf.edu> Closes #9579 from olarayej/SPARK-10863_NEW14.
*	[SPARK-11587][SPARKR] Fix the summary generic to match base R	Shivaram Venkataraman	2015-11-09	1	-0/+6
\| \| \| \| \| \| \| \| \|	The signature is summary(object, ...) as defined in https://stat.ethz.ch/R-manual/R-devel/library/base/html/summary.html Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #9582 from shivaram/summary-fix.
*	[SPARK-9865][SPARKR] Flaky SparkR test: test_sparkSQL.R: sample on a DataFrame	felixcheung	2015-11-09	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	Make sample test less flaky by setting the seed Tested with ``` repeat { if (count(sample(df, FALSE, 0.1)) == 3) { break } } ``` Author: felixcheung <felixcheung_m@hotmail.com> Closes #9549 from felixcheung/rsample.
*	[SPARK-11494][ML][R] Expose R-like summary statistics in SparkR::glm for ↵	Yanbo Liang	2015-11-09	1	-7/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	linear regression Expose R-like summary statistics in SparkR::glm for linear regression, the output of ```summary``` like ```Java $DevianceResiduals Min Max -0.9509607 0.7291832 $Coefficients Estimate Std. Error t value Pr(>\|t\|) (Intercept) 1.6765 0.2353597 7.123139 4.456124e-11 Sepal_Length 0.3498801 0.04630128 7.556598 4.187317e-12 Species_versicolor -0.9833885 0.07207471 -13.64402 0 Species_virginica -1.00751 0.09330565 -10.79796 0 ``` Author: Yanbo Liang <ybliang8@gmail.com> Closes #9561 from yanboliang/spark-11494.
*	[SPARK-10116][CORE] XORShiftRandom.hashSeed is random in high bits	Imran Rashid	2015-11-06	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \|	https://issues.apache.org/jira/browse/SPARK-10116 This is really trivial, just happened to notice it -- if `XORShiftRandom.hashSeed` is really supposed to have random bits throughout (as the comment implies), it needs to do something for the conversion to `long`. mengxr mkolod Author: Imran Rashid <irashid@cloudera.com> Closes #8314 from squito/SPARK-10116.
*	[SPARK-11542] [SPARKR] fix glm with long fomular	Davies Liu	2015-11-05	1	-0/+12
\| \| \| \| \| \| \| \|	Because deparse() will break the long string into multiple lines, the deserialization will fail Author: Davies Liu <davies@databricks.com> Closes #9510 from davies/fix_glm.
*	[SPARK-11260][SPARKR] with() function support	adrian555	2015-11-05	1	-0/+9
\| \| \| \| \| \| \|	Author: adrian555 <wzhuang@us.ibm.com> Author: Adrian Zhuang <adrian555@users.noreply.github.com> Closes #9443 from adrian555/with.
*	[SPARK-9492][ML][R] LogisticRegression in R should provide model statistics	Yanbo Liang	2015-11-04	1	-0/+17
\| \| \| \| \| \| \| \|	Like ml ```LinearRegression```, ```LogisticRegression``` should provide a training summary including feature names and their coefficients. Author: Yanbo Liang <ybliang8@gmail.com> Closes #9303 from yanboliang/spark-9492.
*	[SPARK-11340][SPARKR] Support setting driver properties when starting Spark ↵	felixcheung	2015-10-30	1	-0/+27
\| \| \| \| \| \| \| \| \| \| \| \| \|	from R programmatically or from RStudio Mapping spark.driver.memory from sparkEnvir to spark-submit commandline arguments. shivaram suggested that we possibly add other spark.driver.* properties - do we want to add all of those? I thought those could be set in SparkConf? sun-rui Author: felixcheung <felixcheung_m@hotmail.com> Closes #9290 from felixcheung/rdrivermem.
*	[SPARK-11210][SPARKR] Add window functions into SparkR [step 2].	Sun Rui	2015-10-30	1	-0/+5
\| \| \| \| \| \|	Author: Sun Rui <rui.sun@intel.com> Closes #9196 from sun-rui/SPARK-11210.
*	[SPARK-11209][SPARKR] Add window functions into SparkR [step 1].	Sun Rui	2015-10-26	1	-0/+2
\| \| \| \| \| \|	Author: Sun Rui <rui.sun@intel.com> Closes #9193 from sun-rui/SPARK-11209.
*	[SPARK-10979][SPARKR] Sparkrmerge: Add merge to DataFrame with R signature	Narine Kokhlikyan	2015-10-26	1	-4/+33
\| \| \| \| \| \| \| \| \|	Add merge function to DataFrame, which supports R signature. https://stat.ethz.ch/R-manual/R-devel/library/base/html/merge.html Author: Narine Kokhlikyan <narine.kokhlikyan@gmail.com> Closes #9012 from NarineK/sparkrmerge.
*	[SPARK-11244][SPARKR] sparkR.stop() should remove SQLContext	Forest Fang	2015-10-22	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SparkR should remove `.sparkRSQLsc` and `.sparkRHivesc` when `sparkR.stop()` is called. Otherwise even when SparkContext is reinitialized, `sparkRSQL.init` returns the stale copy of the object and complains: ```r sc <- sparkR.init("local") sqlContext <- sparkRSQL.init(sc) sparkR.stop() sc <- sparkR.init("local") sqlContext <- sparkRSQL.init(sc) sqlContext ``` producing ```r Error in callJMethod(x, "getClass") : Invalid jobj 1. If SparkR was restarted, Spark operations need to be re-executed. ``` I have added the check and removal only when SparkContext itself is initialized. I have also added corresponding test for this fix. Let me know if you want me to move the test to SQL test suite instead. p.s. I tried lint-r but ended up a lots of errors on existing code. Author: Forest Fang <forest.fang@outlook.com> Closes #9205 from saurfang/sparkR.stop.
*	[SPARK-11197][SQL] run SQL on files directly	Davies Liu	2015-10-21	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	This PR introduce a new feature to run SQL directly on files without create a table, for example: ``` select id from json.`path/to/json/files` as j ``` Author: Davies Liu <davies@databricks.com> Closes #9173 from davies/source.
*	[SPARK-10668] [ML] Use WeightedLeastSquares in LinearRegression with L…	lewuathe	2015-10-19	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	…2 regularization if the number of features is small Author: lewuathe <lewuathe@me.com> Author: Lewuathe <sasaki@treasure-data.com> Author: Kai Sasaki <sasaki@treasure-data.com> Author: Lewuathe <lewuathe@me.com> Closes #8884 from Lewuathe/SPARK-10668.
*	[SPARK-10996] [SPARKR] Implement sampleBy() in DataFrameStatFunctions.	Sun Rui	2015-10-13	1	-0/+10
\| \| \| \| \| \|	Author: Sun Rui <rui.sun@intel.com> Closes #9023 from sun-rui/SPARK-10996.
*	[SPARK-10981] [SPARKR] SparkR Join improvements	Monica Liu	2015-10-13	1	-2/+25
\| \| \| \| \| \| \| \| \| \|	I was having issues with collect() and orderBy() in Spark 1.5.0 so I used the DataFrame.R file and test_sparkSQL.R file from the Spark 1.5.1 download. I only modified the join() function in DataFrame.R to include "full", "fullouter", "left", "right", and "leftsemi" and added corresponding test cases in the test for join() and merge() in test_sparkSQL.R file. Pull request because I filed this JIRA bug report: https://issues.apache.org/jira/browse/SPARK-10981 Author: Monica Liu <liu.monica.f@gmail.com> Closes #9029 from mfliu/master.
*	[SPARK-10913] [SPARKR] attach() function support	Adrian Zhuang	2015-10-13	1	-0/+20
\| \| \| \| \| \| \| \| \|	Bring the change code up to date. Author: Adrian Zhuang <adrian555@users.noreply.github.com> Author: adrian555 <wzhuang@us.ibm.com> Closes #9031 from adrian555/attach2.
*	[SPARK-10888] [SPARKR] Added as.DataFrame as a synonym to createDataFrame	Narine Kokhlikyan	2015-10-13	1	-0/+15
\| \| \| \| \| \| \| \| \|	as.DataFrame is more a R-style like signature. Also, I'd like to know if we could make the context, e.g. sqlContext global, so that we do not have to specify it as an argument, when we each time create a dataframe. Author: Narine Kokhlikyan <narine.kokhlikyan@gmail.com> Closes #8952 from NarineK/sparkrasDataFrame.
*	[SPARK-10051] [SPARKR] Support collecting data of StructType in DataFrame	Sun Rui	2015-10-13	1	-22/+29
\| \| \| \| \| \| \| \| \| \| \| \|	Two points in this PR: 1. Originally thought was that a named R list is assumed to be a struct in SerDe. But this is problematic because some R functions will implicitly generate named lists that are not intended to be a struct when transferred by SerDe. So SerDe clients have to explicitly mark a names list as struct by changing its class from "list" to "struct". 2. SerDe is in the Spark Core module, and data of StructType is represented as GenricRow which is defined in Spark SQL module. SerDe can't import GenricRow as in maven build Spark SQL module depends on Spark Core module. So this PR adds a registration hook in SerDe to allow SQLUtils in Spark SQL module to register its functions for serialization and deserialization of StructType. Author: Sun Rui <rui.sun@intel.com> Closes #8794 from sun-rui/SPARK-10051.