spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	[SPARK-17445][DOCS] Reference an ASF page as the main place to find ↵	Sean Owen	2016-09-14	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	third-party packages ## What changes were proposed in this pull request? Point references to spark-packages.org to https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects This will be accompanied by a parallel change to the spark-website repo, and additional changes to this wiki. ## How was this patch tested? Jenkins tests. Author: Sean Owen <sowen@cloudera.com> Closes #15075 from srowen/SPARK-17445.
*	[SPARK-17317][SPARKR] Add SparkR vignette	junyangq	2016-09-13	2	-2/+870
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This PR tries to add a SparkR vignette, which works as a friendly guidance going through the functionality provided by SparkR. ## How was this patch tested? Manual test. Author: junyangq <qianjunyang@gmail.com> Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Author: Junyang Qian <junyangq@databricks.com> Closes #14980 from junyangq/SPARKR-vignette.
*	[SPARK-16445][MLLIB][SPARKR] Fix @return description for sparkR mlp ↵	Xin Ren	2016-09-10	2	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	summary() method ## What changes were proposed in this pull request? Fix summary() method's `return` description for spark.mlp ## How was this patch tested? Ran tests locally on my laptop. Author: Xin Ren <iamshrek@126.com> Closes #15015 from keypointt/SPARK-16445-2.
*	[SPARK-17464][SPARKR][ML] SparkR spark.als argument reg should be 0.1 by ↵	Yanbo Liang	2016-09-09	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	default. ## What changes were proposed in this pull request? SparkR ```spark.als``` arguments ```reg``` should be 0.1 by default, which need to be consistent with ML. ## How was this patch tested? Existing tests. Author: Yanbo Liang <ybliang8@gmail.com> Closes #15021 from yanboliang/spark-17464.
*	[SPARK-17442][SPARKR] Additional arguments in write.df are not passed to ↵	Felix Cheung	2016-09-08	2	-1/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	data source ## What changes were proposed in this pull request? additional options were not passed down in write.df. ## How was this patch tested? unit tests falaki shivaram Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #15010 from felixcheung/testreadoptions.
*	[SPARK-17339][SPARKR][CORE] Fix some R tests and use Path.toUri in ↵	hyukjinkwon	2016-09-07	1	-4/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SparkContext for Windows paths in SparkR ## What changes were proposed in this pull request? This PR fixes the Windows path issues in several APIs. Please refer https://issues.apache.org/jira/browse/SPARK-17339 for more details. ## How was this patch tested? Tests via AppVeyor CI - https://ci.appveyor.com/project/HyukjinKwon/spark/build/82-SPARK-17339-fix-r Also, manually, ![2016-09-06 3 14 38](https://cloud.githubusercontent.com/assets/6477701/18263406/b93a98be-7444-11e6-9521-b28ee65a4771.png) Author: hyukjinkwon <gurwls223@gmail.com> Closes #14960 from HyukjinKwon/SPARK-17339.
*	[SPARK-16785] R dapply doesn't return array or raw columns	Clark Fitzgerald	2016-09-06	5	-1/+72
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Fixed bug in `dapplyCollect` by changing the `compute` function of `worker.R` to explicitly handle raw (binary) vectors. cc shivaram ## How was this patch tested? Unit tests Author: Clark Fitzgerald <clarkfitzg@gmail.com> Closes #14783 from clarkfitzg/SPARK-16785.
*	[SPARK-17315][SPARKR] Kolmogorov-Smirnov test SparkR wrapper	Junyang Qian	2016-09-03	4	-2/+148
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This PR tries to add Kolmogorov-Smirnov Test wrapper to SparkR. This wrapper implementation only supports one sample test against normal distribution. ## How was this patch tested? R unit test. Author: Junyang Qian <junyangq@databricks.com> Closes #14881 from junyangq/SPARK-17315.
*	[SPARKR][MINOR] Fix docs for sparkR.session and count	Junyang Qian	2016-09-02	3	-4/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This PR tries to add some more explanation to `sparkR.session`. It also modifies doc for `count` so when grouped in one doc, the description doesn't confuse users. ## How was this patch tested? Manual test. ![screen shot 2016-09-02 at 1 21 36 pm](https://cloud.githubusercontent.com/assets/15318264/18217198/409613ac-7110-11e6-8dae-cb0c8df557bf.png) Author: Junyang Qian <junyangq@databricks.com> Closes #14942 from junyangq/fixSparkRSessionDoc.
*	[SPARK-17298][SQL] Require explicit CROSS join for cartesian products	Srinath Shankar	2016-09-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Require the use of CROSS join syntax in SQL (and a new crossJoin DataFrame API) to specify explicit cartesian products between relations. By cartesian product we mean a join between relations R and S where there is no join condition involving columns from both R and S. If a cartesian product is detected in the absence of an explicit CROSS join, an error must be thrown. Turning on the "spark.sql.crossJoin.enabled" configuration flag will disable this check and allow cartesian products without an explicit CROSS join. The new crossJoin DataFrame API must be used to specify explicit cross joins. The existing join(DataFrame) method will produce a INNER join that will require a subsequent join condition. That is df1.join(df2) is equivalent to select * from df1, df2. ## How was this patch tested? Added cross-join.sql to the SQLQueryTestSuite to test the check for cartesian products. Added a couple of tests to the DataFrameJoinSuite to test the crossJoin API. Modified various other test suites to explicitly specify a cross join where an INNER join or a comma-separated list was previously used. Author: Srinath Shankar <srinath@databricks.com> Closes #14866 from srinathshankar/crossjoin.
*	[SPARK-17376][SPARKR] followup - change since version	Felix Cheung	2016-09-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? change since version in doc ## How was this patch tested? manual Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #14939 from felixcheung/rsparkversion2.
*	[SPARKR][DOC] regexp_extract should doc that it returns empty string when ↵	Felix Cheung	2016-09-02	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	match fails ## What changes were proposed in this pull request? Doc change - see https://issues.apache.org/jira/browse/SPARK-16324 ## How was this patch tested? manual check Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #14934 from felixcheung/regexpextractdoc.
*	[SPARK-17376][SPARKR] Spark version should be available in R	Felix Cheung	2016-09-02	3	-6/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Add sparkR.version() API. ``` > sparkR.version() [1] "2.1.0-SNAPSHOT" ``` ## How was this patch tested? manual, unit tests Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #14935 from felixcheung/rsparksessionversion.
*	[SPARK-16883][SPARKR] SQL decimal type is not properly cast to number when ↵	wm624@hotmail.com	2016-09-02	3	-1/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	collecting SparkDataFrame ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) registerTempTable(createDataFrame(iris), "iris") str(collect(sql("select cast('1' as double) as x, cast('2' as decimal) as y from iris limit 5"))) 'data.frame': 5 obs. of 2 variables: $ x: num 1 1 1 1 1 $ y:List of 5 ..$ : num 2 ..$ : num 2 ..$ : num 2 ..$ : num 2 ..$ : num 2 The problem is that spark returns `decimal(10, 0)` col type, instead of `decimal`. Thus, `decimal(10, 0)` is not handled correctly. It should be handled as "double". As discussed in JIRA thread, we can have two potential fixes: 1). Scala side fix to add a new case when writing the object back; However, I can't use spark.sql.types._ in Spark core due to dependency issues. I don't find a way of doing type case match; 2). SparkR side fix: Add a helper function to check special type like `"decimal(10, 0)"` and replace it with `double`, which is PRIMITIVE type. This special helper is generic for adding new types handling in the future. I open this PR to discuss pros and cons of both approaches. If we want to do Scala side fix, we need to find a way to match the case of DecimalType and StructType in Spark Core. ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) Manual test: > str(collect(sql("select cast('1' as double) as x, cast('2' as decimal) as y from iris limit 5"))) 'data.frame': 5 obs. of 2 variables: $ x: num 1 1 1 1 1 $ y: num 2 2 2 2 2 R Unit tests Author: wm624@hotmail.com <wm624@hotmail.com> Closes #14613 from wangmiao1981/type.
*	[SPARK-17241][SPARKR][MLLIB] SparkR spark.glm should have configurable ↵	Xin Ren	2016-08-31	2	-4/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	regularization parameter https://issues.apache.org/jira/browse/SPARK-17241 ## What changes were proposed in this pull request? Spark has configurable L2 regularization parameter for generalized linear regression. It is very important to have them in SparkR so that users can run ridge regression. ## How was this patch tested? Test manually on local laptop. Author: Xin Ren <iamshrek@126.com> Closes #14856 from keypointt/SPARK-17241.
*	[SPARKR][MINOR] Fix windowPartitionBy example	Junyang Qian	2016-08-31	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? The usage in the original example is incorrect. This PR fixes it. ## How was this patch tested? Manual test. Author: Junyang Qian <junyangq@databricks.com> Closes #14903 from junyangq/SPARKR-FixWindowPartitionByDoc.
*	[SPARK-16581][SPARKR] Fix JVM API tests in SparkR	Shivaram Venkataraman	2016-08-31	1	-11/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Remove cleanup.jobj test. Use JVM wrapper API for other test cases. ## How was this patch tested? Run R unit tests with testthat 1.0 Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #14904 from shivaram/sparkr-jvm-tests-fix.
*	[SPARK-17326][SPARKR] Fix tests with HiveContext in SparkR not to be skipped ↵	hyukjinkwon	2016-08-31	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	always ## What changes were proposed in this pull request? Currently, `HiveContext` in SparkR is not being tested and always skipped. This is because the initiation of `TestHiveContext` is being failed due to trying to load non-existing data paths (test tables). This is introduced from https://github.com/apache/spark/pull/14005 This enables the tests with SparkR. ## How was this patch tested? Manually, Before (on Mac OS) ``` ... Skipped ------------------------------------------------------------------------ 1. create DataFrame from RDD (test_sparkSQL.R#200) - Hive is not build with SparkSQL, skipped 2. test HiveContext (test_sparkSQL.R#1041) - Hive is not build with SparkSQL, skipped 3. read/write ORC files (test_sparkSQL.R#1748) - Hive is not build with SparkSQL, skipped 4. enableHiveSupport on SparkSession (test_sparkSQL.R#2480) - Hive is not build with SparkSQL, skipped 5. sparkJars tag in SparkContext (test_Windows.R#21) - This test is only for Windows, skipped ... ``` After (on Mac OS) ``` ... Skipped ------------------------------------------------------------------------ 1. sparkJars tag in SparkContext (test_Windows.R#21) - This test is only for Windows, skipped ... ``` Please refer the tests below (on Windows) - Before: https://ci.appveyor.com/project/HyukjinKwon/spark/build/45-test123 - After: https://ci.appveyor.com/project/HyukjinKwon/spark/build/46-test123 Author: hyukjinkwon <gurwls223@gmail.com> Closes #14889 from HyukjinKwon/SPARK-17326.
*	[MINOR][SPARKR] Verbose build comment in WINDOWS.md rather than promoting ↵	hyukjinkwon	2016-08-31	1	-1/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	default build without Hive ## What changes were proposed in this pull request? This PR fixes `WINDOWS.md` to imply referring other profiles in http://spark.apache.org/docs/latest/building-spark.html#building-with-buildmvn rather than directly pointing to run `mvn -DskipTests -Psparkr package` without Hive supports. ## How was this patch tested? Manually, <img width="626" alt="2016-08-31 6 01 08" src="https://cloud.githubusercontent.com/assets/6477701/18122549/f6297b2c-6fa4-11e6-9b5e-fd4347355d87.png"> Author: hyukjinkwon <gurwls223@gmail.com> Closes #14890 from HyukjinKwon/minor-build-r.
*	[SPARK-16581][SPARKR] Make JVM backend calling functions public	Shivaram Venkataraman	2016-08-29	4	-2/+167
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This change exposes a public API in SparkR to create objects, call methods on the Spark driver JVM ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) Unit tests, CRAN checks Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #14775 from shivaram/sparkr-java-api.
*	[SPARKR][MINOR] Fix LDA doc	Junyang Qian	2016-08-29	1	-3/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This PR tries to fix the name of the `SparkDataFrame` used in the example. Also, it gives a reference url of an example data file so that users can play with. ## How was this patch tested? Manual test. Author: Junyang Qian <junyangq@databricks.com> Closes #14853 from junyangq/SPARKR-FixLDADoc.
*	[SPARKR][MINOR] Fix example of spark.naiveBayes	Junyang Qian	2016-08-26	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? The original example doesn't work because the features are not categorical. This PR fixes this by changing to another dataset. ## How was this patch tested? Manual test. Author: Junyang Qian <junyangq@databricks.com> Closes #14820 from junyangq/SPARK-FixNaiveBayes.
*	[SPARKR][MINOR] Add installation message for remote master mode and improve ↵	Junyang Qian	2016-08-24	3	-39/+80
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	other messages ## What changes were proposed in this pull request? This PR gives informative message to users when they try to connect to a remote master but don't have Spark package in their local machine. As a clarification, for now, automatic installation will only happen if they start SparkR in R console (rather than from sparkr-shell) and connect to local master. In the remote master mode, local Spark package is still needed, but we will not trigger the install.spark function because the versions have to match those on the cluster, which involves more user input. Instead, we here try to provide detailed message that may help the users. Some of the other messages have also been slightly changed. ## How was this patch tested? Manual test. Author: Junyang Qian <junyangq@databricks.com> Closes #14761 from junyangq/SPARK-16579-V1.
*	[SPARKR][MINOR] Add more examples to window function docs	Junyang Qian	2016-08-24	2	-18/+72
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This PR adds more examples to window function docs to make them more accessible to the users. It also fixes default value issues for `lag` and `lead`. ## How was this patch tested? Manual test, R unit test. Author: Junyang Qian <junyangq@databricks.com> Closes #14779 from junyangq/SPARKR-FixWindowFunctionDocs.
*	[MINOR][SPARKR] fix R MLlib parameter documentation	Felix Cheung	2016-08-24	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Fixed several misplaced param tag - they should be on the spark.* method generics ## How was this patch tested? run knitr junyangq Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #14792 from felixcheung/rdocmllib.
*	[SPARK-16445][MLLIB][SPARKR] Multilayer Perceptron Classifier wrapper in SparkR	Xin Ren	2016-08-24	4	-5/+157
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	https://issues.apache.org/jira/browse/SPARK-16445 ## What changes were proposed in this pull request? Create Multilayer Perceptron Classifier wrapper in SparkR ## How was this patch tested? Tested manually on local machine Author: Xin Ren <iamshrek@126.com> Closes #14447 from keypointt/SPARK-16445.
*	[SPARKR][MINOR] Fix doc for show method	Junyang Qian	2016-08-24	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? The original doc of `show` put methods for multiple classes together but the text only talks about `SparkDataFrame`. This PR tries to fix this problem. ## How was this patch tested? Manual test. Author: Junyang Qian <junyangq@databricks.com> Closes #14776 from junyangq/SPARK-FixShowDoc.
*	[SPARKR][MINOR] Remove reference link for common Windows environment variables	Junyang Qian	2016-08-23	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? The PR removes reference link in the doc for environment variables for common Windows folders. The cran check gave code 503: service unavailable on the original link. ## How was this patch tested? Manual check. Author: Junyang Qian <junyangq@databricks.com> Closes #14767 from junyangq/SPARKR-RemoveLink.
*	[SPARKR][MINOR] Update R DESCRIPTION file	Felix Cheung	2016-08-22	1	-4/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Update DESCRIPTION ## How was this patch tested? Run install and CRAN tests Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #14764 from felixcheung/rpackagedescription.
*	[SPARK-16577][SPARKR] Add CRAN documentation checks to run-tests.sh	Shivaram Venkataraman	2016-08-22	2	-6/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? This change adds CRAN documentation checks to be run as a part of `R/run-tests.sh` . As this script is also used by Jenkins this means that we will get documentation checks on every PR going forward. (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #14759 from shivaram/sparkr-cran-jenkins.
*	[SPARK-16508][SPARKR] doc updates and more CRAN check fixes	Felix Cheung	2016-08-22	12	-114/+119
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? replace ``` ` ``` in code doc with `\code{thing}` remove added `...` for drop(DataFrame) fix remaining CRAN check warnings ## How was this patch tested? create doc with knitr junyangq Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #14734 from felixcheung/rdoccleanup.
*	[SPARKR][MINOR] Add Xiangrui and Felix to maintainers	Shivaram Venkataraman	2016-08-22	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This change adds Xiangrui Meng and Felix Cheung to the maintainers field in the package description. ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #14758 from shivaram/sparkr-maintainers.
*	[SPARK-17173][SPARKR] R MLlib refactor, cleanup, reformat, fix deprecation ↵	Felix Cheung	2016-08-22	2	-117/+98
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	in test ## What changes were proposed in this pull request? refactor, cleanup, reformat, fix deprecation in test ## How was this patch tested? unit tests, manual tests Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #14735 from felixcheung/rmllibutil.
*	[SPARKR][MINOR] Fix Cache Folder Path in Windows	Junyang Qian	2016-08-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This PR tries to fix the scheme of local cache folder in Windows. The name of the environment variable should be `LOCALAPPDATA` rather than `%LOCALAPPDATA%`. ## How was this patch tested? Manual test in Windows 7. Author: Junyang Qian <junyangq@databricks.com> Closes #14743 from junyangq/SPARKR-FixWindowsInstall.
*	[MINOR][R] add SparkR.Rcheck/ and SparkR_*.tar.gz to R/.gitignore	Xiangrui Meng	2016-08-21	1	-0/+2
\| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Ignore temp files generated by `check-cran.sh`. Author: Xiangrui Meng <meng@databricks.com> Closes #14740 from mengxr/R-gitignore.
*	[SPARK-16961][FOLLOW-UP][SPARKR] More robust test case for ↵	Yanbo Liang	2016-08-21	1	-22/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	spark.gaussianMixture. ## What changes were proposed in this pull request? #14551 fixed off-by-one bug in ```randomizeInPlace``` and some test failure caused by this fix. But for SparkR ```spark.gaussianMixture``` test case, the fix is inappropriate. It only changed the output result of native R which should be compared by SparkR, however, it did not change the R code in annotation which is used for reproducing the result in native R. It will confuse users who can not reproduce the same result in native R. This PR sends a more robust test case which can produce same result between SparkR and native R. ## How was this patch tested? Unit test update. Author: Yanbo Liang <ybliang8@gmail.com> Closes #14730 from yanboliang/spark-16961-followup.
*	[SPARK-16508][SPARKR] Fix CRAN undocumented/duplicated arguments warnings.	Junyang Qian	2016-08-20	11	-267/+419
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This PR tries to fix all the remaining "undocumented/duplicated arguments" warnings given by CRAN-check. One left is doc for R `stats::glm` exported in SparkR. To mute that warning, we have to also provide document for all arguments of that non-SparkR function. Some previous conversation is in #14558. ## How was this patch tested? R unit test and `check-cran.sh` script (with no-test). Author: Junyang Qian <junyangq@databricks.com> Closes #14705 from junyangq/SPARK-16508-master.
*	[SPARK-16443][SPARKR] Alternating Least Squares (ALS) wrapper	Junyang Qian	2016-08-19	4	-5/+201
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Add Alternating Least Squares wrapper in SparkR. Unit tests have been updated. ## How was this patch tested? SparkR unit tests. (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) ![screen shot 2016-07-27 at 3 50 31 pm](https://cloud.githubusercontent.com/assets/15318264/17195347/f7a6352a-5411-11e6-8e21-61a48070192a.png) ![screen shot 2016-07-27 at 3 50 46 pm](https://cloud.githubusercontent.com/assets/15318264/17195348/f7a7d452-5411-11e6-845f-6d292283bc28.png) Author: Junyang Qian <junyangq@databricks.com> Closes #14384 from junyangq/SPARK-16443.
*	[SPARK-16961][CORE] Fixed off-by-one error that biased randomizeInPlace	Nick Lavers	2016-08-19	1	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	JIRA issue link: https://issues.apache.org/jira/browse/SPARK-16961 Changed one line of Utils.randomizeInPlace to allow elements to stay in place. Created a unit test that runs a Pearson's chi squared test to determine whether the output diverges significantly from a uniform distribution. Author: Nick Lavers <nick.lavers@videoamp.com> Closes #14551 from nicklavers/SPARK-16961-randomizeInPlace.
*	[SPARK-16447][ML][SPARKR] LDA wrapper in SparkR	Xusen Yin	2016-08-18	4	-2/+268
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Add LDA Wrapper in SparkR with the following interfaces: - spark.lda(data, ...) - spark.posterior(object, newData, ...) - spark.perplexity(object, ...) - summary(object) - write.ml(object) - read.ml(path) ## How was this patch tested? Test with SparkR unit test. Author: Xusen Yin <yinxusen@gmail.com> Closes #14229 from yinxusen/SPARK-16447.
*	[SPARK-16446][SPARKR][ML] Gaussian Mixture Model wrapper in SparkR	Yanbo Liang	2016-08-17	4	-3/+208
\| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Gaussian Mixture Model wrapper in SparkR, similarly to R's ```mvnormalmixEM```. ## How was this patch tested? Unit test. Author: Yanbo Liang <ybliang8@gmail.com> Closes #14392 from yanboliang/spark-16446.
*	[SPARK-16444][SPARKR] Isotonic Regression wrapper in SparkR	wm624@hotmail.com	2016-08-17	4	-1/+156
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) Add Isotonic Regression wrapper in SparkR Wrappers in R and Scala are added. Unit tests Documentation ## How was this patch tested? Manually tested with sudo ./R/run-tests.sh (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) Author: wm624@hotmail.com <wm624@hotmail.com> Closes #14182 from wangmiao1981/isoR.
*	[SPARK-16519][SPARKR] Handle SparkR RDD generics that create warnings in R ↵	Felix Cheung	2016-08-16	17	-287/+312
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	CMD check ## What changes were proposed in this pull request? Rename RDD functions for now to avoid CRAN check warnings. Some RDD functions are sharing generics with DataFrame functions (hence the problem) so after the renames we need to add new generics, for now. ## How was this patch tested? unit tests Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #14626 from felixcheung/rrddfunctions.
*	[MINOR][SPARKR] spark.glm weightCol should in the signature.	Yanbo Liang	2016-08-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Fix the issue that ```spark.glm``` ```weightCol``` should in the signature. ## How was this patch tested? Existing tests. Author: Yanbo Liang <ybliang8@gmail.com> Closes #14641 from yanboliang/weightCol.
*	[SPARK-16508][SPARKR] Split docs for arrange and orderBy methods	Junyang Qian	2016-08-15	3	-15/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This PR splits arrange and orderBy methods according to their functionality (the former for sorting sparkDataFrame and the latter for windowSpec). ## How was this patch tested? ![screen shot 2016-08-06 at 6 39 19 pm](https://cloud.githubusercontent.com/assets/15318264/17459969/51eade28-5c05-11e6-8ca1-8d8a8e344bab.png) ![screen shot 2016-08-06 at 6 39 29 pm](https://cloud.githubusercontent.com/assets/15318264/17459966/51e3c246-5c05-11e6-8d35-3e905ca48676.png) ![screen shot 2016-08-06 at 6 40 02 pm](https://cloud.githubusercontent.com/assets/15318264/17459967/51e650ec-5c05-11e6-8698-0f037f5199ff.png) Author: Junyang Qian <junyangq@databricks.com> Closes #14522 from junyangq/SPARK-16508-0.
*	[SPARK-16579][SPARKR] add install.spark function	Junyang Qian	2016-08-10	7	-4/+267
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Add an install_spark function to the SparkR package. User can run `install_spark()` to install Spark to a local directory within R. Updates: Several changes have been made: - `install.spark()` - check existence of tar file in the cache folder, and download only if not found - trial priority of mirror_url look-up: user-provided -> preferred mirror site from apache website -> hardcoded backup option - use 2.0.0 - `sparkR.session()` - can install spark when not found in `SPARK_HOME` ## How was this patch tested? Manual tests, running the check-cran.sh script added in #14173. Author: Junyang Qian <junyangq@databricks.com> Closes #14258 from junyangq/SPARK-16579.
*	[SPARK-16710][SPARKR][ML] spark.glm should support weightCol	Yanbo Liang	2016-08-10	2	-4/+33
\| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Training GLMs on weighted dataset is very important use cases, but it is not supported by SparkR currently. Users can pass argument ```weights``` to specify the weights vector in native R. For ```spark.glm```, we can pass in the ```weightCol``` which is consistent with MLlib. ## How was this patch tested? Unit test. Author: Yanbo Liang <ybliang8@gmail.com> Closes #14346 from yanboliang/spark-16710.
*	[MINOR][SPARKR] R API documentation for "coltypes" is confusing	Xin Ren	2016-08-10	1	-5/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? R API documentation for "coltypes" is confusing, found when working on another ticket. Current version http://spark.apache.org/docs/2.0.0/api/R/coltypes.html, where parameters have 2 "x" which is a duplicate, and also the example is not very clear ![current](https://cloud.githubusercontent.com/assets/3925641/17386808/effb98ce-59a2-11e6-9657-d477d258a80c.png) ![screen shot 2016-08-03 at 5 56 00 pm](https://cloud.githubusercontent.com/assets/3925641/17386884/91831096-59a3-11e6-84af-39890b3d45d8.png) ## How was this patch tested? Tested manually on local machine. And the screenshots are like below: ![screen shot 2016-08-07 at 11 29 20 pm](https://cloud.githubusercontent.com/assets/3925641/17471144/df36633c-5cf6-11e6-8238-4e32ead0e529.png) ![screen shot 2016-08-03 at 5 56 22 pm](https://cloud.githubusercontent.com/assets/3925641/17386896/9d36cb26-59a3-11e6-9619-6dae29f7ab17.png) Author: Xin Ren <iamshrek@126.com> Closes #14489 from keypointt/rExample.
*	[SPARKR][DOCS] fix broken url in doc	Felix Cheung	2016-07-25	2	-9/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Fix broken url, also, sparkR.session.stop doc page should have it in the header, instead of saying "sparkR.stop" ![image](https://cloud.githubusercontent.com/assets/8969467/17080129/26d41308-50d9-11e6-8967-79d6c920313f.png) Data type section is in the middle of a list of gapply/gapplyCollect subsections: ![image](https://cloud.githubusercontent.com/assets/8969467/17080122/f992d00a-50d8-11e6-8f2c-fd5786213920.png) ## How was this patch tested? manual test Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #14329 from felixcheung/rdoclinkfix.
*	[SPARK-10683][SPARK-16510][SPARKR] Move SparkR include jar test to ↵	Shivaram Venkataraman	2016-07-19	3	-41/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SparkSubmitSuite ## What changes were proposed in this pull request? This change moves the include jar test from R to SparkSubmitSuite and uses a dynamically compiled jar. This helps us remove the binary jar from the R package and solves both the CRAN warnings and the lack of source being available for this jar. ## How was this patch tested? SparkR unit tests, SparkSubmitSuite, check-cran.sh Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #14243 from shivaram/sparkr-jar-move.