spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	[SPARK-19464][CORE][YARN][TEST-HADOOP2.6] Remove support for Hadoop 2.5 and ↵	Sean Owen	2017-02-08	3	-514/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	earlier ## What changes were proposed in this pull request? - Remove support for Hadoop 2.5 and earlier - Remove reflection and code constructs only needed to support multiple versions at once - Update docs to reflect newer versions - Remove older versions' builds and profiles. ## How was this patch tested? Existing tests Author: Sean Owen <sowen@cloudera.com> Closes #16810 from srowen/SPARK-19464.
*	[SPARK-19409][BUILD] Bump parquet version to 1.8.2	Dongjoon Hyun	2017-01-31	5	-30/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? According to the discussion on #16281 which tried to upgrade toward Apache Parquet 1.9.0, Apache Spark community prefer to upgrade to 1.8.2 instead of 1.9.0. Now, Apache Parquet 1.8.2 is released officially last week on 26 Jan. We can use 1.8.2 now. https://lists.apache.org/thread.html/af0c813f1419899289a336d96ec02b3bbeecaea23aa6ef69f435c142%3Cdev.parquet.apache.org%3E This PR only aims to bump Parquet version to 1.8.2. It didn't touch any other codes. ## How was this patch tested? Pass the existing tests and also manually by doing `./dev/test-dependencies.sh`. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #16751 from dongjoon-hyun/SPARK-19409.
*	[SPARK-18782][BUILD] Bump Hadoop 2.6 version to use Hadoop 2.6.5	Adam Roberts	2017-01-18	1	-15/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	What changes were proposed in this pull request? Use Hadoop 2.6.5 for the Hadoop 2.6 profile, I see a bunch of fixes including security ones in the release notes that we should pick up How was this patch tested? Running the unit tests now with IBM's SDK for Java and let's see what happens with OpenJDK in the community builder - expecting no trouble as it is only a minor release. Author: Adam Roberts <aroberts@uk.ibm.com> Closes #16616 from a-roberts/Hadoop265Bumper.
*	[SPARK-18971][CORE] Upgrade Netty to 4.0.43.Final	Shixiong Zhu	2017-01-15	5	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Upgrade Netty to `4.0.43.Final` to add the fix for https://github.com/netty/netty/issues/6153 ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixiong@databricks.com> Closes #16568 from zsxwing/SPARK-18971.
*	[SPARK-18997][CORE] Recommended upgrade libthrift to 0.9.3	Sean Owen	2017-01-10	5	-10/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Updates to libthrift 0.9.3 to address a CVE. ## How was this patch tested? Existing tests. Author: Sean Owen <sowen@cloudera.com> Closes #16530 from srowen/SPARK-18997.
*	[SPARK-18951] Upgrade com.thoughtworks.paranamer/paranamer to 2.6	Yin Huai	2016-12-21	5	-5/+5
\| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? I recently hit a bug of com.thoughtworks.paranamer/paranamer, which causes jackson fail to handle byte array defined in a case class. Then I find https://github.com/FasterXML/jackson-module-scala/issues/48, which suggests that it is caused by a bug in paranamer. Let's upgrade paranamer. Since we are using jackson 2.6.5 and jackson-module-paranamer 2.6.5 use com.thoughtworks.paranamer/paranamer 2.6, I suggests that we upgrade paranamer to 2.6. Author: Yin Huai <yhuai@databricks.com> Closes #16359 from yhuai/SPARK-18951.
*	[SPARK-18586][BUILD] netty-3.8.0.Final.jar has vulnerability CVE-2014-3488 ↵	Sean Owen	2016-12-03	5	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	and CVE-2014-0193 ## What changes were proposed in this pull request? Force update to latest Netty 3.9.x, for dependencies like Flume, to resolve two CVEs. 3.9.2 is the first version that resolves both, and, this is the latest in the 3.9.x line. ## How was this patch tested? Existing tests Author: Sean Owen <sowen@cloudera.com> Closes #16102 from srowen/SPARK-18586.
*	[SPARK-18602] Set the version of org.codehaus.janino:commons-compiler to ↵	Yin Huai	2016-11-28	5	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	3.0.0 to match the version of org.codehaus.janino:janino ## What changes were proposed in this pull request? org.codehaus.janino:janino depends on org.codehaus.janino:commons-compiler and we have been upgraded to org.codehaus.janino:janino 3.0.0. However, seems we are still pulling in org.codehaus.janino:commons-compiler 2.7.6 because of calcite. It looks like an accident because we exclude janino from calcite (see here https://github.com/apache/spark/blob/branch-2.1/pom.xml#L1759). So, this PR upgrades org.codehaus.janino:commons-compiler to 3.0.0. ## How was this patch tested? jenkins Author: Yin Huai <yhuai@databricks.com> Closes #16025 from yhuai/janino-commons-compile.
*	[SPARK-18375][SPARK-18383][BUILD][CORE] Upgrade netty to 4.0.42.Final	Guoqiang Li	2016-11-12	5	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? One of the important changes for 4.0.42.Final is "Support any FileRegion implementation when using epoll transport netty/netty#5825". In 4.0.42.Final, `MessageWithHeader` can work properly when `spark.[shuffle\|rpc].io.mode` is set to epoll ## How was this patch tested? Existing tests Author: Guoqiang Li <witgo@qq.com> Closes #15830 from witgo/SPARK-18375_netty-4.0.42.
*	[SPARK-18262][BUILD][SQL] JSON.org license is now CatX	Sean Owen	2016-11-10	5	-5/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Try excluding org.json:json from hive-exec dep as it's Cat X now. It may be the case that it's not used by the part of Hive Spark uses anyway. ## How was this patch tested? Existing tests Author: Sean Owen <sowen@cloudera.com> Closes #15798 from srowen/SPARK-18262.
*	[SPARK-17960][PYSPARK][UPGRADE TO PY4J 0.10.4]	Jagadeesan	2016-10-21	5	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? 1) Upgrade the Py4J version on the Java side 2) Update the py4j src zip file we bundle with Spark ## How was this patch tested? Existing doctests & unit tests pass Author: Jagadeesan <as2@us.ibm.com> Closes #15514 from jagadeesanas2/SPARK-17960.
*	[SPARK-17985][CORE] Bump commons-lang3 version to 3.5.	Takuya UESHIN	2016-10-19	5	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? `SerializationUtils.clone()` of commons-lang3 (<3.5) has a bug that breaks thread safety, which gets stack sometimes caused by race condition of initializing hash map. See https://issues.apache.org/jira/browse/LANG-1251. ## How was this patch tested? Existing tests. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #15548 from ueshin/issues/SPARK-17985.
*	Revert "[SPARK-17985][CORE] Bump commons-lang3 version to 3.5."	Reynold Xin	2016-10-18	5	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit bfe7885aee2f406c1bbde08e30809a0b4bb070d2. The commit caused build failures on Hadoop 2.2 profile: ``` [error] /scratch/rxin/spark/core/src/main/scala/org/apache/spark/util/Utils.scala:1489: value read is not a member of object org.apache.commons.io.IOUtils [error] var numBytes = IOUtils.read(gzInputStream, buf) [error] ^ [error] /scratch/rxin/spark/core/src/main/scala/org/apache/spark/util/Utils.scala:1492: value read is not a member of object org.apache.commons.io.IOUtils [error] numBytes = IOUtils.read(gzInputStream, buf) [error] ^ ```
*	[SPARK-17985][CORE] Bump commons-lang3 version to 3.5.	Takuya UESHIN	2016-10-18	5	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? `SerializationUtils.clone()` of commons-lang3 (<3.5) has a bug that breaks thread safety, which gets stack sometimes caused by race condition of initializing hash map. See https://issues.apache.org/jira/browse/LANG-1251. ## How was this patch tested? Existing tests. Author: Takuya UESHIN <ueshin@happy-camper.st> Closes #15525 from ueshin/issues/SPARK-17985.
*	[SPARK-17808][PYSPARK] Upgraded version of Pyrolite to 4.13	Bryan Cutler	2016-10-11	5	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Upgraded to a newer version of Pyrolite which supports serialization of a BinaryType StructField for PySpark.SQL ## How was this patch tested? Added a unit test which fails with a raised ValueError when using the previous version of Pyrolite 4.9 and Python3 Author: Bryan Cutler <cutlerb@gmail.com> Closes #15386 from BryanCutler/pyrolite-upgrade-SPARK-17808.
*	[SPARK-17583][SQL] Remove uesless rowSeparator variable and set ↵	hyukjinkwon	2016-09-21	5	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	auto-expanding buffer as default for maxCharsPerColumn option in CSV ## What changes were proposed in this pull request? This PR includes the changes below: 1. Upgrade Univocity library from 2.1.1 to 2.2.1 This includes some performance improvement and also enabling auto-extending buffer in `maxCharsPerColumn` option in CSV. Please refer the [release notes](https://github.com/uniVocity/univocity-parsers/releases). 2. Remove useless `rowSeparator` variable existing in `CSVOptions` We have this unused variable in [CSVOptions.scala#L127](https://github.com/apache/spark/blob/29952ed096fd2a0a19079933ff691671d6f00835/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala#L127) but it seems possibly causing confusion that it actually does not care of `\r\n`. For example, we have an issue open about this, [SPARK-17227](https://issues.apache.org/jira/browse/SPARK-17227), describing this variable. This variable is virtually not being used because we rely on `LineRecordReader` in Hadoop which deals with only both `\n` and `\r\n`. 3. Set the default value of `maxCharsPerColumn` to auto-expending. We are setting 1000000 for the length of each column. It'd be more sensible we allow auto-expending rather than fixed length by default. To make sure, using `-1` is being described in the release note, [2.2.0](https://github.com/uniVocity/univocity-parsers/releases/tag/v2.2.0). ## How was this patch tested? N/A Author: hyukjinkwon <gurwls223@gmail.com> Closes #15138 from HyukjinKwon/SPARK-17583.
*	[SPARK-17558] Bump Hadoop 2.7 version from 2.7.2 to 2.7.3	Reynold Xin	2016-09-16	1	-15/+15
\| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This patch bumps the Hadoop version in hadoop-2.7 profile from 2.7.2 to 2.7.3, which was recently released and contained a number of bug fixes. ## How was this patch tested? The change should be covered by existing tests. Author: Reynold Xin <rxin@databricks.com> Closes #15115 from rxin/SPARK-17558.
*	[SPARK-17379][BUILD] Upgrade netty-all to 4.0.41 final for bug fixes	Adam Roberts	2016-09-15	5	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Upgrade netty-all to latest in the 4.0.x line which is 4.0.41, mentions several bug fixes and performance improvements we may find useful, see netty.io/news/2016/08/29/4-0-41-Final-4-1-5-Final.html. Initially tried to use 4.1.5 but noticed it's not backwards compatible. ## How was this patch tested? Existing unit tests against branch-1.6 and branch-2.0 using IBM Java 8 on Intel, Power and Z architectures Author: Adam Roberts <aroberts@uk.ibm.com> Closes #14961 from a-roberts/netty.
*	[SPARK-17378][BUILD] Upgrade snappy-java to 1.1.2.6	Adam Roberts	2016-09-06	5	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Upgrades the Snappy version to 1.1.2.6 from 1.1.2.4, release notes: https://github.com/xerial/snappy-java/blob/master/Milestone.md mention "Fix a bug in SnappyInputStream when reading compressed data that happened to have the same first byte with the stream magic header (#142)" ## How was this patch tested? Existing unit tests using the latest IBM Java 8 on Intel, Power and Z architectures (little and big-endian) Author: Adam Roberts <aroberts@uk.ibm.com> Closes #14958 from a-roberts/master.
*	[SPARK-5682][CORE] Add encrypted shuffle in spark	Ferdinand Xu	2016-08-30	5	-0/+5
\| \| \| \| \| \| \| \| \|	This patch is using Apache Commons Crypto library to enable shuffle encryption support. Author: Ferdinand Xu <cheng.a.xu@intel.com> Author: kellyzly <kellyzly@126.com> Closes #8880 from winningsix/SPARK-10771.
*	[SPARK-16781][PYSPARK] java launched by PySpark as gateway may not be the ↵	Sean Owen	2016-08-24	5	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	same java used in the spark environment ## What changes were proposed in this pull request? Update to py4j 0.10.3 to enable JAVA_HOME support ## How was this patch tested? Pyspark tests Author: Sean Owen <sowen@cloudera.com> Closes #14748 from srowen/SPARK-16781.
*	[SPARK-16770][BUILD] Fix JLine dependency management and version (Sca…	Stefan Schulze	2016-08-03	5	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? As of Scala 2.11.x there is no longer a org.scala-lang:jline version aligned to the scala version itself. Scala console now uses the plain jline:jline module. Spark's dependency management did not reflect this change properly, causing Maven to pull in Jline via transitive dependency. Unfortunately Jline 2.12 contained a minor but very annoying bug rendering the shell almost useless for developers with german keyboard layout. This request contains the following chages: - Exclude transitive dependency 'jline:jline' from hive-exec module - Remove global properties 'jline.version' and 'jline.groupId' - Add both properties and dependency to 'scala-2.11' profile - Add explicit dependency on 'jline:jline' to module 'spark-repl' ## How was this patch tested? - Running mvn dependency:tree and checking for correct Jline version 2.12.1 - Running full builds with assembly and checking for jline-2.12.1.jar in 'lib' folder of generated tarball Author: Stefan Schulze <stefan.schulze@pentasys.de> Closes #14429 from stsc-pentasys/SPARK-16770.
*	[SPARK-16637] Unified containerizer	Michael Gummelt	2016-07-29	5	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? New config var: spark.mesos.docker.containerizer={"mesos","docker" (default)} This adds support for running docker containers via the Mesos unified containerizer: http://mesos.apache.org/documentation/latest/container-image/ The benefit is losing the dependency on `dockerd`, and all the costs which it incurs. I've also updated the supported Mesos version to 0.28.2 for support of the required protobufs. This is blocked on: https://github.com/apache/spark/pull/14167 ## How was this patch tested? - manually testing jobs submitted with both "mesos" and "docker" settings for the new config var. - spark/mesos integration test suite Author: Michael Gummelt <mgummelt@mesosphere.io> Closes #14275 from mgummelt/unified-containerizer.
*	[SPARK-16751] Upgrade derby to 10.12.1.1	Adam Roberts	2016-07-29	5	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Version of derby upgraded based on important security info at VersionEye. Test scope added so we don't include it in our final package anyway. NB: I think this should be backported to all previous releases as it is a security problem https://www.versioneye.com/java/org.apache.derby:derby/10.11.1.1 The CVE number is 2015-1832. I also suggest we add a SECURITY tag for JIRAs ## How was this patch tested? Existing tests with the change making sure that we see no new failures. I checked derby 10.12.x and not derby 10.11.x is downloaded to our ~/.m2 folder. I then used dev/make-distribution.sh and checked the dist/jars folder for Spark 2.0: no derby jar is present. I don't know if this would also remove it from the assembly jar in our 1.x branches. Author: Adam Roberts <aroberts@uk.ibm.com> Closes #14379 from a-roberts/patch-4.
*	[SPARK-15271][MESOS] Allow force pulling executor docker images	Philipp Hoffmann	2016-07-26	5	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Mesos agents by default will not pull docker images which are cached locally already. In order to run Spark executors from mutable tags like `:latest` this commit introduces a Spark setting (`spark.mesos.executor.docker.forcePullImage`). Setting this flag to true will tell the Mesos agent to force pull the docker image (default is `false` which is consistent with the previous implementation and Mesos' default behaviour). Author: Philipp Hoffmann <mail@philipphoffmann.de> Closes #14348 from philipphoffmann/force-pull-image.
*	Revert "[SPARK-15271][MESOS] Allow force pulling executor docker images"	Josh Rosen	2016-07-25	5	-5/+5
\| \| \| \|	This reverts commit 978cd5f125eb5a410bad2e60bf8385b11cf1b978.
*	[SPARK-15271][MESOS] Allow force pulling executor docker images	Philipp Hoffmann	2016-07-25	5	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Mesos agents by default will not pull docker images which are cached locally already. In order to run Spark executors from mutable tags like `:latest` this commit introduces a Spark setting `spark.mesos.executor.docker.forcePullImage`. Setting this flag to true will tell the Mesos agent to force pull the docker image (default is `false` which is consistent with the previous implementation and Mesos' default behaviour). ## How was this patch tested? I ran a sample application including this change on a Mesos cluster and verified the correct behaviour for both, with and without, force pulling the executor image. As expected the image is being force pulled if the flag is set. Author: Philipp Hoffmann <mail@philipphoffmann.de> Closes #13051 from philipphoffmann/force-pull-image.
*	[SPARK-16494][ML] Upgrade breeze version to 0.12	Yanbo Liang	2016-07-19	5	-10/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? breeze 0.12 has been released for more than half a year, and it brings lots of new features, performance improvement and bug fixes. One of the biggest features is ```LBFGS-B``` which is an implementation of ```LBFGS``` with box constraints and much faster for some special case. We would like to implement Huber loss function for ```LinearRegression``` ([SPARK-3181](https://issues.apache.org/jira/browse/SPARK-3181)) and it requires ```LBFGS-B``` as the optimization solver. So we should bump up the dependent breeze version to 0.12. For more features, improvements and bug fixes of breeze 0.12, you can refer the following link: https://groups.google.com/forum/#!topic/scala-breeze/nEeRi_DcY5c ## How was this patch tested? No new tests, should pass the existing ones. Author: Yanbo Liang <ybliang8@gmail.com> Closes #14150 from yanboliang/spark-16494.
*	[SPARK-15467][BUILD] update janino version to 3.0.0	Kazuaki Ishizaki	2016-07-10	5	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This PR updates version of Janino compiler from 2.7.8 to 3.0.0. This version fixes [an Janino issue](https://github.com/janino-compiler/janino/issues/1) that fixes [an issue](https://issues.apache.org/jira/browse/SPARK-15467), which throws Java exception, in Spark. ## How was this patch tested? Manually tested using a program in [the JIRA entry](https://issues.apache.org/jira/browse/SPARK-15467) Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Closes #14127 from kiszk/SPARK-15467.
*	[SPARK-15818][BUILD] Upgrade to Hadoop 2.7.2	Adam Roberts	2016-06-09	3	-45/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Updating the Hadoop version from 2.7.0 to 2.7.2 if we use the Hadoop-2.7 build profile ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) Existing tests (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) I'd like us to use Hadoop 2.7.2 owing to the Hadoop release notes stating Hadoop 2.7.0 is not ready for production use https://hadoop.apache.org/docs/r2.7.0/ states "Apache Hadoop 2.7.0 is a minor release in the 2.x.y release line, building upon the previous stable release 2.6.0. This release is not yet ready for production use. Production users should use 2.7.1 release and beyond." Hadoop 2.7.1 release notes: "Apache Hadoop 2.7.1 is a minor release in the 2.x.y release line, building upon the previous release 2.7.0. This is the next stable release after Apache Hadoop 2.6.x." And then Hadoop 2.7.2 release notes: "Apache Hadoop 2.7.2 is a minor release in the 2.x.y release line, building upon the previous stable release 2.7.1." I've tested this is OK with Intel hardware and IBM Java 8 so let's test it with OpenJDK, ideally this will be pushed to branch-2.0 and master. Author: Adam Roberts <aroberts@uk.ibm.com> Closes #13556 from a-roberts/patch-2.
*	Revert "[SPARK-11753][SQL][TEST-HADOOP2.2] Make allowNonNumericNumbers ↵	Shixiong Zhu	2016-05-31	5	-30/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	option work ## What changes were proposed in this pull request? This reverts commit c24b6b679c3efa053f7de19be73eb36dc70d9930. Sent a PR to run Jenkins tests due to the revert conflicts of `dev/deps/spark-deps-hadoop*`. ## How was this patch tested? Jenkins unit tests, integration tests, manual tests) Author: Shixiong Zhu <shixiong@databricks.com> Closes #13417 from zsxwing/revert-SPARK-11753.
*	[SPARK-9876][SQL] Update Parquet to 1.8.1.	Ryan Blue	2016-05-27	5	-30/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This includes minimal changes to get Spark using the current release of Parquet, 1.8.1. ## How was this patch tested? This uses the existing Parquet tests. Author: Ryan Blue <blue@apache.org> Closes #13280 from rdblue/SPARK-9876-update-parquet.
*	[SPARK-15523][ML][MLLIB] Update JPMML to 1.2.15	Villu Ruusmann	2016-05-26	5	-15/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? See https://issues.apache.org/jira/browse/SPARK-15523 This PR replaces PR #13293. It's isolated to a new branch, and contains some more squashed changes. ## How was this patch tested? 1. Executed `mvn clean package` in `mllib` directory 2. Executed `dev/test-dependencies.sh --replace-manifest` in the root directory. Author: Villu Ruusmann <villu.ruusmann@gmail.com> Closes #13297 from vruusmann/update-jpmml.
*	[SPARK-15525][SQL][BUILD] Upgrade ANTLR4 SBT plugin	Herman van Hovell	2016-05-25	5	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? The ANTLR4 SBT plugin has been moved from its own repo to one on bintray. The version was also changed from `0.7.10` to `0.7.11`. The latter actually broke our build (ihji has fixed this by also adding `0.7.10` and others to the bin-tray repo). This PR upgrades the SBT-ANTLR4 plugin and ANTLR4 to their most recent versions (`0.7.11`/`4.5.3`). I have also removed a few obsolete build configurations. ## How was this patch tested? Manually running SBT/Maven builds. Author: Herman van Hovell <hvanhovell@questtec.nl> Closes #13299 from hvanhovell/SPARK-15525.
*	[SPARK-15493][SQL] default QuoteEscapingEnabled flag to true when writing CSV	Jurriaan Pruis	2016-05-25	5	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Default QuoteEscapingEnabled flag to true when writing CSV and add an escapeQuotes option to be able to change this. See https://github.com/uniVocity/univocity-parsers/blob/f3eb2af26374940e60d91d1703bde54619f50c51/src/main/java/com/univocity/parsers/csv/CsvWriterSettings.java#L231-L247 This change is needed to be able to write RFC 4180 compatible CSV files (https://tools.ietf.org/html/rfc4180#section-2) https://issues.apache.org/jira/browse/SPARK-15493 ## How was this patch tested? Added a test that verifies the output is quoted correctly. Author: Jurriaan Pruis <email@jurriaanpruis.nl> Closes #13267 from jurriaan/quote-escaping.
*	[SPARK-11753][SQL][TEST-HADOOP2.2] Make allowNonNumericNumbers option work	Liang-Chi Hsieh	2016-05-24	5	-25/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Jackson suppprts `allowNonNumericNumbers` option to parse non-standard non-numeric numbers such as "NaN", "Infinity", "INF". Currently used Jackson version (2.5.3) doesn't support it all. This patch upgrades the library and make the two ignored tests in `JsonParsingOptionsSuite` passed. ## How was this patch tested? `JsonParsingOptionsSuite`. Author: Liang-Chi Hsieh <simonh@tw.ibm.com> Author: Liang-Chi Hsieh <viirya@appier.com> Closes #9759 from viirya/fix-json-nonnumric.
*	[SPARK-12972][CORE][TEST-MAVEN][TEST-HADOOP2.2] Update ↵	Sean Owen	2016-05-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	org.apache.httpcomponents.httpclient, commons-io ## What changes were proposed in this pull request? This is sort of a hot-fix for https://github.com/apache/spark/pull/13117, but, the problem is limited to Hadoop 2.2. The change is to manage `commons-io` to 2.4 for all Hadoop builds, which is only a net change for Hadoop 2.2, which was using 2.1. ## How was this patch tested? Jenkins tests -- normal PR builder, then the `[test-hadoop2.2] [test-maven]` if successful. Author: Sean Owen <sowen@cloudera.com> Closes #13132 from srowen/SPARK-12972.3.
*	[SPARK-12972][CORE] Update org.apache.httpcomponents.httpclient	Sean Owen	2016-05-15	5	-10/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? (Retry of https://github.com/apache/spark/pull/13049) - update to httpclient 4.5 / httpcore 4.4 - remove some defunct exclusions - manage httpmime version to match - update selenium / httpunit to support 4.5 (possible now that Jetty 9 is used) ## How was this patch tested? Jenkins tests. Also, locally running the same test command of one Jenkins profile that failed: `mvn -Phadoop-2.6 -Pyarn -Phive -Phive-thriftserver -Pkinesis-asl ...` Author: Sean Owen <sowen@cloudera.com> Closes #13117 from srowen/SPARK-12972.2.
*	Revert "[SPARK-12972][CORE] Update org.apache.httpcomponents.httpclient"	Sean Owen	2016-05-13	5	-10/+10
\| \| \| \|	This reverts commit c74a6c3f2363f065a4915fdadec5eff665fa02e7.
*	[SPARK-12972][CORE] Update org.apache.httpcomponents.httpclient	Sean Owen	2016-05-13	5	-10/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? - update httpcore/httpclient to latest - centralize version management - remove excludes that are no longer relevant according to SBT/Maven dep graphs - also manage httpmime to match httpclient ## How was this patch tested? Jenkins tests, plus review of dependency graphs from SBT/Maven, and review of test-dependencies.sh output Author: Sean Owen <sowen@cloudera.com> Closes #13049 from srowen/SPARK-12972.
*	[SPARK-15061][PYSPARK] Upgrade to Py4J 0.10.1	Holden Karau	2016-05-13	5	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This upgrades to Py4J 0.10.1 which reduces syscal overhead in Java gateway ( see https://github.com/bartdag/py4j/issues/201 ). Related https://issues.apache.org/jira/browse/SPARK-6728 . ## How was this patch tested? Existing doctests & unit tests pass Author: Holden Karau <holden@us.ibm.com> Closes #13064 from holdenk/SPARK-15061-upgrade-to-py4j-0.10.1.
*	[SPARK-14897][SQL] upgrade to jetty 9.2.16	bomeng	2016-05-12	5	-10/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Since Jetty 8 is EOL (end of life) and has critical security issue [http://www.securityweek.com/critical-vulnerability-found-jetty-web-server], I think upgrading to 9 is necessary. I am using latest 9.2 since 9.3 requires Java 8+. `javax.servlet` and `derby` were also upgraded since Jetty 9.2 needs corresponding version. ## How was this patch tested? Manual test and current test cases should cover it. Author: bomeng <bmeng@us.ibm.com> Closes #12916 from bomeng/SPARK-14897.
*	[SPARK-15148][SQL] Upgrade Univocity library from 2.0.2 to 2.1.0	hyukjinkwon	2016-05-05	5	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? https://issues.apache.org/jira/browse/SPARK-15148 Mainly it improves the performance roughtly about 30%-40% according to the [release note](https://github.com/uniVocity/univocity-parsers/releases/tag/v2.1.0). For the details of the purpose is described in the JIRA. This PR upgrades Univocity library from 2.0.2 to 2.1.0. ## How was this patch tested? Existing tests should cover this. Author: hyukjinkwon <gurwls223@gmail.com> Closes #12923 from HyukjinKwon/SPARK-15148.
*	[SPARK-12154] Upgrade to Jersey 2	mcheah	2016-05-05	5	-60/+85
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Replace com.sun.jersey with org.glassfish.jersey. Changes to the Spark Web UI code were required to compile. The changes were relatively standard Jersey migration things. ## How was this patch tested? I did a manual test for the standalone web APIs. Although I didn't test the functionality of the security filter itself, the code that changed non-trivially is how we actually register the filter. I attached a debugger to the Spark master and verified that the SecurityFilter code is indeed invoked upon hitting /api/v1/applications. Author: mcheah <mcheah@palantir.com> Closes #12715 from mccheah/feature/upgrade-jersey.
*	[SPARK-15123] upgrade org.json4s to 3.2.11 version	Lining Sun	2016-05-05	5	-15/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? We had the issue when using snowplow in our Spark applications. Snowplow requires json4s version 3.2.11 while Spark still use a few years old version 3.2.10. The change is to upgrade json4s jar to 3.2.11. ## How was this patch tested? We built Spark jar and successfully ran our applications in local and cluster modes. Author: Lining Sun <lining@gmail.com> Closes #12901 from liningalex/master.
*	[SPARK-14987][SQL] inline hive-service (cli) into sql/hive-thriftserver	Davies Liu	2016-04-29	5	-31/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This PR copy the thrift-server from hive-service-1.2 (including TCLIService.thrift and generated Java source code) into sql/hive-thriftserver, so we can do further cleanup and improvements. ## How was this patch tested? Existing tests. Author: Davies Liu <davies@databricks.com> Closes #12764 from davies/thrift_server.
*	[SPARK-14787][SQL] Upgrade Joda-Time library from 2.9 to 2.9.3	hyukjinkwon	2016-04-21	5	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? https://issues.apache.org/jira/browse/SPARK-14787 The possible problems are described in the JIRA above. Please refer this if you are wondering the purpose of this PR. This PR upgrades Joda-Time library from 2.9 to 2.9.3. ## How was this patch tested? `sbt scalastyle` and Jenkins tests in this PR. closes #11847 Author: hyukjinkwon <gurwls223@gmail.com> Closes #12552 from HyukjinKwon/SPARK-14787.
*	[SPARK-11416][BUILD] Update to Chill 0.8.0 & Kryo 3.0.3	Josh Rosen	2016-04-08	5	-30/+25
\| \| \| \| \| \| \| \|	This patch upgrades Chill to 0.8.0 and Kryo to 3.0.3. While we'll likely need to bump these dependencies again before Spark 2.0 (due to SPARK-14221 / https://github.com/twitter/chill/issues/252), I wanted to get the bulk of the Kryo 2 -> Kryo 3 migration done now in order to figure out whether there are any unexpected surprises. Author: Josh Rosen <joshrosen@databricks.com> Closes #12076 from JoshRosen/kryo3.
*	[SPARK-14103][SQL] Parse unescaped quotes in CSV data source.	hyukjinkwon	2016-04-08	5	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This PR resolves the problem during parsing unescaped quotes in input data. For example, currently the data below: ``` "a"b,ccc,ddd e,f,g ``` produces a data below: - Before ```bash ["a"b,ccc,ddd[\n]e,f,g] <- as a value. ``` - After ```bash ["a"b], [ccc], [ddd] [e], [f], [g] ``` This PR bumps up the Univocity parser's version. This was fixed in `2.0.2`, https://github.com/uniVocity/univocity-parsers/issues/60. ## How was this patch tested? Unit tests in `CSVSuite` and `sbt/sbt scalastyle`. Author: hyukjinkwon <gurwls223@gmail.com> Closes #12226 from HyukjinKwon/SPARK-14103-quote.
*	[SPARK-13579][BUILD] Stop building the main Spark assembly.	Marcelo Vanzin	2016-04-04	5	-15/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change modifies the "assembly/" module to just copy needed dependencies to its build directory, and modifies the packaging script to pick those up (and remove duplicate jars packages in the examples module). I also made some minor adjustments to dependencies to remove some test jars from the final packaging, and remove jars that conflict with each other when packaged separately (e.g. servlet api). Also note that this change restores guava in applications' classpaths, even though it's still shaded inside Spark. This is now needed for the Hadoop libraries that are packaged with Spark, which now are not processed by the shade plugin. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #11796 from vanzin/SPARK-13579.