aboutsummaryrefslogtreecommitdiff
path: root/dev
Commit message (Collapse)AuthorAgeFilesLines
* [SPARK-9876][SQL] Update Parquet to 1.8.1.Ryan Blue2016-05-275-30/+25
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? This includes minimal changes to get Spark using the current release of Parquet, 1.8.1. ## How was this patch tested? This uses the existing Parquet tests. Author: Ryan Blue <blue@apache.org> Closes #13280 from rdblue/SPARK-9876-update-parquet.
* [SPARK-15523][ML][MLLIB] Update JPMML to 1.2.15Villu Ruusmann2016-05-265-15/+10
| | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? See https://issues.apache.org/jira/browse/SPARK-15523 This PR replaces PR #13293. It's isolated to a new branch, and contains some more squashed changes. ## How was this patch tested? 1. Executed `mvn clean package` in `mllib` directory 2. Executed `dev/test-dependencies.sh --replace-manifest` in the root directory. Author: Villu Ruusmann <villu.ruusmann@gmail.com> Closes #13297 from vruusmann/update-jpmml.
* [SPARK-15525][SQL][BUILD] Upgrade ANTLR4 SBT pluginHerman van Hovell2016-05-255-5/+5
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? The ANTLR4 SBT plugin has been moved from its own repo to one on bintray. The version was also changed from `0.7.10` to `0.7.11`. The latter actually broke our build (ihji has fixed this by also adding `0.7.10` and others to the bin-tray repo). This PR upgrades the SBT-ANTLR4 plugin and ANTLR4 to their most recent versions (`0.7.11`/`4.5.3`). I have also removed a few obsolete build configurations. ## How was this patch tested? Manually running SBT/Maven builds. Author: Herman van Hovell <hvanhovell@questtec.nl> Closes #13299 from hvanhovell/SPARK-15525.
* [SPARK-15493][SQL] default QuoteEscapingEnabled flag to true when writing CSVJurriaan Pruis2016-05-255-5/+5
| | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Default QuoteEscapingEnabled flag to true when writing CSV and add an escapeQuotes option to be able to change this. See https://github.com/uniVocity/univocity-parsers/blob/f3eb2af26374940e60d91d1703bde54619f50c51/src/main/java/com/univocity/parsers/csv/CsvWriterSettings.java#L231-L247 This change is needed to be able to write RFC 4180 compatible CSV files (https://tools.ietf.org/html/rfc4180#section-2) https://issues.apache.org/jira/browse/SPARK-15493 ## How was this patch tested? Added a test that verifies the output is quoted correctly. Author: Jurriaan Pruis <email@jurriaanpruis.nl> Closes #13267 from jurriaan/quote-escaping.
* [SPARK-11753][SQL][TEST-HADOOP2.2] Make allowNonNumericNumbers option workLiang-Chi Hsieh2016-05-245-25/+30
| | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Jackson suppprts `allowNonNumericNumbers` option to parse non-standard non-numeric numbers such as "NaN", "Infinity", "INF". Currently used Jackson version (2.5.3) doesn't support it all. This patch upgrades the library and make the two ignored tests in `JsonParsingOptionsSuite` passed. ## How was this patch tested? `JsonParsingOptionsSuite`. Author: Liang-Chi Hsieh <simonh@tw.ibm.com> Author: Liang-Chi Hsieh <viirya@appier.com> Closes #9759 from viirya/fix-json-nonnumric.
* [SPARK-15424][SPARK-15437][SPARK-14807][SQL] Revert Create a ↵Reynold Xin2016-05-202-13/+1
| | | | | | | | | | | | | | hivecontext-compatibility module ## What changes were proposed in this pull request? I initially asked to create a hivecontext-compatibility module to put the HiveContext there. But we are so close to Spark 2.0 release and there is only a single class in it. It seems overkill to have an entire package, which makes it more inconvenient, for a single class. ## How was this patch tested? Tests were moved. Author: Reynold Xin <rxin@databricks.com> Closes #13207 from rxin/SPARK-15424.
* [SPARK-15078] [SQL] Add all TPCDS 1.4 benchmark queries for SparkSQLSameer Agarwal2016-05-201-0/+1
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? Now that SparkSQL supports all TPC-DS queries, this patch adds all 99 benchmark queries inside SparkSQL. ## How was this patch tested? Benchmark only Author: Sameer Agarwal <sameer@databricks.com> Closes #13188 from sameeragarwal/tpcds-all.
* [SPARK-14615][ML] Use the new ML Vector and Matrix in the ML pipeline based ↵DB Tsai2016-05-171-0/+1
| | | | | | | | | | | | | | | | | | algorithms ## What changes were proposed in this pull request? Once SPARK-14487 and SPARK-14549 are merged, we will migrate to use the new vector and matrix type in the new ml pipeline based apis. ## How was this patch tested? Unit tests Author: DB Tsai <dbt@netflix.com> Author: Liang-Chi Hsieh <simonh@tw.ibm.com> Author: Xiangrui Meng <meng@databricks.com> Closes #12627 from dbtsai/SPARK-14615-NewML.
* [SPARK-15290][BUILD] Move annotations, like @Since / @DeveloperApi, into ↵Sean Owen2016-05-171-6/+13
| | | | | | | | | | | | | | | | | | spark-tags ## What changes were proposed in this pull request? (See https://github.com/apache/spark/pull/12416 where most of this was already reviewed and committed; this is just the module structure and move part. This change does not move the annotations into test scope, which was the apparently problem last time.) Rename `spark-test-tags` -> `spark-tags`; move common annotations like `Since` to `spark-tags` ## How was this patch tested? Jenkins tests. Author: Sean Owen <sowen@cloudera.com> Closes #13074 from srowen/SPARK-15290.
* [SPARK-12972][CORE][TEST-MAVEN][TEST-HADOOP2.2] Update ↵Sean Owen2016-05-161-1/+1
| | | | | | | | | | | | | | | | org.apache.httpcomponents.httpclient, commons-io ## What changes were proposed in this pull request? This is sort of a hot-fix for https://github.com/apache/spark/pull/13117, but, the problem is limited to Hadoop 2.2. The change is to manage `commons-io` to 2.4 for all Hadoop builds, which is only a net change for Hadoop 2.2, which was using 2.1. ## How was this patch tested? Jenkins tests -- normal PR builder, then the `[test-hadoop2.2] [test-maven]` if successful. Author: Sean Owen <sowen@cloudera.com> Closes #13132 from srowen/SPARK-12972.3.
* [SPARK-12972][CORE] Update org.apache.httpcomponents.httpclientSean Owen2016-05-155-10/+10
| | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? (Retry of https://github.com/apache/spark/pull/13049) - update to httpclient 4.5 / httpcore 4.4 - remove some defunct exclusions - manage httpmime version to match - update selenium / httpunit to support 4.5 (possible now that Jetty 9 is used) ## How was this patch tested? Jenkins tests. Also, locally running the same test command of one Jenkins profile that failed: `mvn -Phadoop-2.6 -Pyarn -Phive -Phive-thriftserver -Pkinesis-asl ...` Author: Sean Owen <sowen@cloudera.com> Closes #13117 from srowen/SPARK-12972.2.
* Revert "[SPARK-12972][CORE] Update org.apache.httpcomponents.httpclient"Sean Owen2016-05-135-10/+10
| | | | This reverts commit c74a6c3f2363f065a4915fdadec5eff665fa02e7.
* [SPARK-12972][CORE] Update org.apache.httpcomponents.httpclientSean Owen2016-05-135-10/+10
| | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? - update httpcore/httpclient to latest - centralize version management - remove excludes that are no longer relevant according to SBT/Maven dep graphs - also manage httpmime to match httpclient ## How was this patch tested? Jenkins tests, plus review of dependency graphs from SBT/Maven, and review of test-dependencies.sh output Author: Sean Owen <sowen@cloudera.com> Closes #13049 from srowen/SPARK-12972.
* [SPARK-15061][PYSPARK] Upgrade to Py4J 0.10.1Holden Karau2016-05-135-5/+5
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? This upgrades to Py4J 0.10.1 which reduces syscal overhead in Java gateway ( see https://github.com/bartdag/py4j/issues/201 ). Related https://issues.apache.org/jira/browse/SPARK-6728 . ## How was this patch tested? Existing doctests & unit tests pass Author: Holden Karau <holden@us.ibm.com> Closes #13064 from holdenk/SPARK-15061-upgrade-to-py4j-0.10.1.
* [SPARK-14897][SQL] upgrade to jetty 9.2.16bomeng2016-05-125-10/+10
| | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Since Jetty 8 is EOL (end of life) and has critical security issue [http://www.securityweek.com/critical-vulnerability-found-jetty-web-server], I think upgrading to 9 is necessary. I am using latest 9.2 since 9.3 requires Java 8+. `javax.servlet` and `derby` were also upgraded since Jetty 9.2 needs corresponding version. ## How was this patch tested? Manual test and current test cases should cover it. Author: bomeng <bmeng@us.ibm.com> Closes #12916 from bomeng/SPARK-14897.
* [SPARK-15171][SQL] Deprecate registerTempTable and add dataset.createTempViewSean Zhong2016-05-121-2/+2
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? Deprecates registerTempTable and add dataset.createTempView, dataset.createOrReplaceTempView. ## How was this patch tested? Unit tests. Author: Sean Zhong <seanzhong@databricks.com> Closes #12945 from clockfly/spark-15171.
* [SPARK-15072][SQL][PYSPARK] FollowUp: Remove SparkSession.withHiveSupport in ↵Sandeep Singh2016-05-111-3/+5
| | | | | | | | | | | | | | | PySpark ## What changes were proposed in this pull request? This is a followup of https://github.com/apache/spark/pull/12851 Remove `SparkSession.withHiveSupport` in PySpark and instead use `SparkSession.builder. enableHiveSupport` ## How was this patch tested? Existing tests. Author: Sandeep Singh <sandeep@techaddict.me> Closes #13063 from techaddict/SPARK-15072-followup.
* [SPARK-15085][STREAMING][KAFKA] Rename streaming-kafka artifactcody koeninger2016-05-113-6/+6
| | | | | | | | | | | | ## What changes were proposed in this pull request? Renaming the streaming-kafka artifact to include kafka version, in anticipation of needing a different artifact for later kafka versions ## How was this patch tested? Unit tests Author: cody koeninger <cody@koeninger.org> Closes #12946 from koeninger/SPARK-15085.
* [SPARK-15148][SQL] Upgrade Univocity library from 2.0.2 to 2.1.0hyukjinkwon2016-05-055-5/+5
| | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? https://issues.apache.org/jira/browse/SPARK-15148 Mainly it improves the performance roughtly about 30%-40% according to the [release note](https://github.com/uniVocity/univocity-parsers/releases/tag/v2.1.0). For the details of the purpose is described in the JIRA. This PR upgrades Univocity library from 2.0.2 to 2.1.0. ## How was this patch tested? Existing tests should cover this. Author: hyukjinkwon <gurwls223@gmail.com> Closes #12923 from HyukjinKwon/SPARK-15148.
* [SPARK-12154] Upgrade to Jersey 2mcheah2016-05-055-60/+85
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? Replace com.sun.jersey with org.glassfish.jersey. Changes to the Spark Web UI code were required to compile. The changes were relatively standard Jersey migration things. ## How was this patch tested? I did a manual test for the standalone web APIs. Although I didn't test the functionality of the security filter itself, the code that changed non-trivially is how we actually register the filter. I attached a debugger to the Spark master and verified that the SecurityFilter code is indeed invoked upon hitting /api/v1/applications. Author: mcheah <mcheah@palantir.com> Closes #12715 from mccheah/feature/upgrade-jersey.
* [SPARK-15123] upgrade org.json4s to 3.2.11 versionLining Sun2016-05-055-15/+15
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? We had the issue when using snowplow in our Spark applications. Snowplow requires json4s version 3.2.11 while Spark still use a few years old version 3.2.10. The change is to upgrade json4s jar to 3.2.11. ## How was this patch tested? We built Spark jar and successfully ran our applications in local and cluster modes. Author: Lining Sun <lining@gmail.com> Closes #12901 from liningalex/master.
* [SPARK-15053][BUILD] Fix Java Lint errors on Hive-Thriftserver moduleDongjoon Hyun2016-05-031-0/+6
| | | | | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? This issue fixes or hides 181 Java linter errors introduced by SPARK-14987 which copied hive service code from Hive. We had better clean up these errors before releasing Spark 2.0. - Fix UnusedImports (15 lines), RedundantModifier (14 lines), SeparatorWrap (9 lines), MethodParamPad (6 lines), FileTabCharacter (5 lines), ArrayTypeStyle (3 lines), ModifierOrder (3 lines), RedundantImport (1 line), CommentsIndentation (1 line), UpperEll (1 line), FallThrough (1 line), OneStatementPerLine (1 line), NewlineAtEndOfFile (1 line) errors. - Ignore `LineLength` errors under `hive/service/*` (118 lines). - Ignore `MethodName` error in `PasswdAuthenticationProvider.java` (1 line). - Ignore `NoFinalizer` error in `ThreadWithGarbageCleanup.java` (1 line). ## How was this patch tested? After passing Jenkins building, run `dev/lint-java` manually. ```bash $ dev/lint-java Checkstyle checks passed. ``` Author: Dongjoon Hyun <dongjoon@apache.org> Closes #12831 from dongjoon-hyun/SPARK-15053.
* [SPARK-14988][PYTHON] SparkSession catalog and conf APIAndrew Or2016-04-291-0/+3
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? The `catalog` and `conf` APIs were exposed in `SparkSession` in #12713 and #12669. This patch adds those to the python API. ## How was this patch tested? Python tests. Author: Andrew Or <andrew@databricks.com> Closes #12765 from andrewor14/python-spark-session-more.
* [SPARK-14987][SQL] inline hive-service (cli) into sql/hive-thriftserverDavies Liu2016-04-295-31/+0
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? This PR copy the thrift-server from hive-service-1.2 (including TCLIService.thrift and generated Java source code) into sql/hive-thriftserver, so we can do further cleanup and improvements. ## How was this patch tested? Existing tests. Author: Davies Liu <davies@databricks.com> Closes #12764 from davies/thrift_server.
* Revert "[SPARK-14613][ML] Add @Since into the matrix and vector classes in ↵Yin Huai2016-04-281-15/+6
| | | | | | spark-mllib-local" This reverts commit dae538a4d7c36191c1feb02ba87ffc624ab960dc.
* [SPARK-14613][ML] Add @Since into the matrix and vector classes in ↵Pravin Gadakh2016-04-281-6/+15
| | | | | | | | | | | | | | | | spark-mllib-local ## What changes were proposed in this pull request? This PR adds `since` tag into the matrix and vector classes in spark-mllib-local. ## How was this patch tested? Scala-style checks passed. Author: Pravin Gadakh <prgadakh@in.ibm.com> Closes #12416 from pravingadakh/SPARK-14613.
* [SPARK-14867][BUILD] Remove `--force` option in `build/mvn`Dongjoon Hyun2016-04-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Currently, `build/mvn` provides a convenient option, `--force`, in order to use the recommended version of maven without changing PATH environment variable. However, there were two problems. - `dev/lint-java` does not use the newly installed maven. ```bash $ ./build/mvn --force clean $ ./dev/lint-java Using `mvn` from path: /usr/local/bin/mvn ``` - It's not easy to type `--force` option always. If '--force' option is used once, we had better prefer the installed maven recommended by Spark. This PR makes `build/mvn` check the existence of maven installed by `--force` option first. According to the comments, this PR aims to the followings: - Detect the maven version from `pom.xml`. - Install maven if there is no or old maven. - Remove `--force` option. ## How was this patch tested? Manual. ```bash $ ./build/mvn --force clean $ ./dev/lint-java Using `mvn` from path: /Users/dongjoon/spark/build/apache-maven-3.3.9/bin/mvn ... $ rm -rf ./build/apache-maven-3.3.9/ $ ./dev/lint-java Using `mvn` from path: /usr/local/bin/mvn ``` Author: Dongjoon Hyun <dongjoon@apache.org> Closes #12631 from dongjoon-hyun/SPARK-14867.
* [MINOR][BUILD] Enable RAT checking on `LZ4BlockInputStream.java`.Dongjoon Hyun2016-04-271-1/+0
| | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? Since `LZ4BlockInputStream.java` is not licensed to Apache Software Foundation (ASF), the Apache License header of that file is not monitored until now. This PR aims to enable RAT checking on `LZ4BlockInputStream.java` by excluding from `dev/.rat-excludes`. This will prevent accidental removal of Apache License header from that file. ## How was this patch tested? Pass the Jenkins tests (Specifically, RAT check stage). Author: Dongjoon Hyun <dongjoon@apache.org> Closes #12677 from dongjoon-hyun/minor_rat_exclusion_file.
* [SPARK-14721][SQL] Remove HiveContext (part 2)Andrew Or2016-04-251-5/+3
| | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? This removes the class `HiveContext` itself along with all code usages associated with it. The bulk of the work was already done in #12485. This is mainly just code cleanup and actually removing the class. Note: A couple of things will break after this patch. These will be fixed separately. - the python HiveContext - all the documentation / comments referencing HiveContext - there will be no more HiveContext in the REPL (fixed by #12589) ## How was this patch tested? No change in functionality. Author: Andrew Or <andrew@databricks.com> Closes #12585 from andrewor14/delete-hive-context.
* [SPARK-14868][BUILD] Enable NewLineAtEofChecker in checkstyle and fix ↵Dongjoon Hyun2016-04-241-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | lint-java errors ## What changes were proposed in this pull request? Spark uses `NewLineAtEofChecker` rule in Scala by ScalaStyle. And, most Java code also comply with the rule. This PR aims to enforce the same rule `NewlineAtEndOfFile` by CheckStyle explicitly. Also, this fixes lint-java errors since SPARK-14465. The followings are the items. - Adds a new line at the end of the files (19 files) - Fixes 25 lint-java errors (12 RedundantModifier, 6 **ArrayTypeStyle**, 2 LineLength, 2 UnusedImports, 2 RegexpSingleline, 1 ModifierOrder) ## How was this patch tested? After the Jenkins test succeeds, `dev/lint-java` should pass. (Currently, Jenkins dose not run lint-java.) ```bash $ dev/lint-java Using `mvn` from path: /usr/local/bin/mvn Checkstyle checks passed. ``` Author: Dongjoon Hyun <dongjoon@apache.org> Closes #12632 from dongjoon-hyun/SPARK-14868.
* [SPARK-14807] Create a compatibility moduleYin Huai2016-04-222-2/+14
| | | | | | | | | | | | | ## What changes were proposed in this pull request? This PR creates a compatibility module in sql (called `hive-1-x-compatibility`), which will host HiveContext in Spark 2.0 (moving HiveContext to here will be done separately). This module is not included in assembly because only users who still want to access HiveContext need it. ## How was this patch tested? I manually tested `sbt/sbt -Phive package` and `mvn -Phive package -DskipTests`. Author: Yin Huai <yhuai@databricks.com> Closes #12580 from yhuai/compatibility.
* [SPARK-14787][SQL] Upgrade Joda-Time library from 2.9 to 2.9.3hyukjinkwon2016-04-215-5/+5
| | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? https://issues.apache.org/jira/browse/SPARK-14787 The possible problems are described in the JIRA above. Please refer this if you are wondering the purpose of this PR. This PR upgrades Joda-Time library from 2.9 to 2.9.3. ## How was this patch tested? `sbt scalastyle` and Jenkins tests in this PR. closes #11847 Author: hyukjinkwon <gurwls223@gmail.com> Closes #12552 from HyukjinKwon/SPARK-14787.
* [SPARK-13904][SCHEDULER] Add support for pluggable cluster managerHemant Bhanawat2016-04-161-0/+1
| | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? This commit adds support for pluggable cluster manager. And also allows a cluster manager to clean up tasks without taking the parent process down. To plug a new external cluster manager, ExternalClusterManager trait should be implemented. It returns task scheduler and backend scheduler that will be used by SparkContext to schedule tasks. An external cluster manager is registered using the java.util.ServiceLoader mechanism (This mechanism is also being used to register data sources like parquet, json, jdbc etc.). This allows auto-loading implementations of ExternalClusterManager interface. Currently, when a driver fails, executors exit using system.exit. This does not bode well for cluster managers that would like to reuse the parent process of an executor. Hence, 1. Moving system.exit to a function that can be overriden in subclasses of CoarseGrainedExecutorBackend. 2. Added functionality of killing all the running tasks in an executor. ## How was this patch tested? ExternalClusterManagerSuite.scala was added to test this patch. Author: Hemant Bhanawat <hemant@snappydata.io> Closes #11723 from hbhanawat/pluggableScheduler.
* [SPARK-14462][ML][MLLIB] Add the mllib-local build to maven pomDB Tsai2016-04-111-1/+13
| | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? In order to separate the linear algebra, and vector matrix classes into a standalone jar, we need to setup the build first. This PR will create a new jar called mllib-local with minimal dependencies. The previous PR was failing the build because of `spark-core:test` dependency, and that was reverted. In this PR, `FunSuite` with `// scalastyle:ignore funsuite` in mllib-local test was used, similar to sketch. Thanks. ## How was this patch tested? Unit tests mengxr tedyu holdenk Author: DB Tsai <dbt@netflix.com> Closes #12298 from dbtsai/dbtsai-mllib-local-build-fix.
* Revert "[SPARK-14462][ML][MLLIB] add the mllib-local build to maven pom"Xiangrui Meng2016-04-091-13/+1
| | | | This reverts commit 1598d11bb0248384872cf88bc2b16f3b238046ad.
* [SPARK-14462][ML][MLLIB] add the mllib-local build to maven pomDB Tsai2016-04-091-1/+13
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? In order to separate the linear algebra, and vector matrix classes into a standalone jar, we need to setup the build first. This PR will create a new jar called mllib-local with minimal dependencies. The test scope will still depend on spark-core and spark-core-test in order to use the common utilities, but the runtime will avoid any platform dependency. Couple platform independent classes will be moved to this package to demonstrate how this work. ## How was this patch tested? Unit tests Author: DB Tsai <dbt@netflix.com> Closes #12241 from dbtsai/dbtsai-mllib-local-build.
* [SPARK-11416][BUILD] Update to Chill 0.8.0 & Kryo 3.0.3Josh Rosen2016-04-085-30/+25
| | | | | | | | This patch upgrades Chill to 0.8.0 and Kryo to 3.0.3. While we'll likely need to bump these dependencies again before Spark 2.0 (due to SPARK-14221 / https://github.com/twitter/chill/issues/252), I wanted to get the bulk of the Kryo 2 -> Kryo 3 migration done now in order to figure out whether there are any unexpected surprises. Author: Josh Rosen <joshrosen@databricks.com> Closes #12076 from JoshRosen/kryo3.
* [SPARK-14103][SQL] Parse unescaped quotes in CSV data source.hyukjinkwon2016-04-085-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? This PR resolves the problem during parsing unescaped quotes in input data. For example, currently the data below: ``` "a"b,ccc,ddd e,f,g ``` produces a data below: - **Before** ```bash ["a"b,ccc,ddd[\n]e,f,g] <- as a value. ``` - **After** ```bash ["a"b], [ccc], [ddd] [e], [f], [g] ``` This PR bumps up the Univocity parser's version. This was fixed in `2.0.2`, https://github.com/uniVocity/univocity-parsers/issues/60. ## How was this patch tested? Unit tests in `CSVSuite` and `sbt/sbt scalastyle`. Author: hyukjinkwon <gurwls223@gmail.com> Closes #12226 from HyukjinKwon/SPARK-14103-quote.
* [SPARK-13579][BUILD] Stop building the main Spark assembly.Marcelo Vanzin2016-04-048-32/+30
| | | | | | | | | | | | | | | | | | | | This change modifies the "assembly/" module to just copy needed dependencies to its build directory, and modifies the packaging script to pick those up (and remove duplicate jars packages in the examples module). I also made some minor adjustments to dependencies to remove some test jars from the final packaging, and remove jars that conflict with each other when packaged separately (e.g. servlet api). Also note that this change restores guava in applications' classpaths, even though it's still shaded inside Spark. This is now needed for the Hadoop libraries that are packaged with Spark, which now are not processed by the shade plugin. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #11796 from vanzin/SPARK-13579.
* [SPARK-13825][CORE] Upgrade to Scala 2.11.8Jacek Laskowski2016-04-015-20/+20
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? Upgrade to 2.11.8 (from the current 2.11.7) ## How was this patch tested? A manual build Author: Jacek Laskowski <jacek@japila.pl> Closes #11681 from jaceklaskowski/SPARK-13825-scala-2_11_8.
* [SPARK-14277][CORE] Upgrade Snappy Java to 1.1.2.4Sital Kedia2016-03-315-5/+5
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? Upgrade snappy to 1.1.2.4 to improve snappy read/write performance. ## How was this patch tested? Tested by running a job on the cluster and saw 7.5% cpu savings after this change. Author: Sital Kedia <skedia@fb.com> Closes #12096 from sitalkedia/snappyRelease.
* [SPARK-14211][SQL] Remove ANTLR3 based parserHerman van Hovell2016-03-315-5/+15
| | | | | | | | | | | | | | | | ### What changes were proposed in this pull request? This PR removes the ANTLR3 based parser, and moves the new ANTLR4 based parser into the `org.apache.spark.sql.catalyst.parser package`. ### How was this patch tested? Existing unit tests. cc rxin andrewor14 yhuai Author: Herman van Hovell <hvanhovell@questtec.nl> Closes #12071 from hvanhovell/SPARK-14211.
* [SPARK-13713][SQL] Migrate parser from ANTLR3 to ANTLR4Herman van Hovell2016-03-285-0/+5
| | | | | | | | | | | | | | | | | | | | | | ### What changes were proposed in this pull request? The current ANTLR3 parser is quite complex to maintain and suffers from code blow-ups. This PR introduces a new parser that is based on ANTLR4. This parser is based on the [Presto's SQL parser](https://github.com/facebook/presto/blob/master/presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4). The current implementation can parse and create Catalyst and SQL plans. Large parts of the HiveQl DDL and some of the DML functionality is currently missing, the plan is to add this in follow-up PRs. This PR is a work in progress, and work needs to be done in the following area's: - [x] Error handling should be improved. - [x] Documentation should be improved. - [x] Multi-Insert needs to be tested. - [ ] Naming and package locations. ### How was this patch tested? Catalyst and SQL unit tests. Author: Herman van Hovell <hvanhovell@questtec.nl> Closes #11557 from hvanhovell/ngParser.
* [SPARK-14073][STREAMING][TEST-MAVEN] Move flume back to SparkShixiong Zhu2016-03-253-1/+36
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? This PR moves flume back to Spark as per the discussion in the dev mail-list. ## How was this patch tested? Existing Jenkins tests. Author: Shixiong Zhu <shixiong@databricks.com> Closes #11895 from zsxwing/move-flume-back.
* [SPARK-13887][PYTHON][TRIVIAL][BUILD] Make lint-python script fail fastHolden Karau2016-03-251-37/+6
| | | | | | | | | | | | | | ## What changes were proposed in this pull request? Change lint python script to stop on first error rather than building them up so its clearer why we failed (requested by rxin). Also while in the file, remove the commented out code. ## How was this patch tested? Manually ran lint-python script with & without pep8 errors locally and verified expected results. Author: Holden Karau <holden@us.ibm.com> Closes #11898 from holdenk/SPARK-13887-pylint-fast-fail.
* [SPARK-14074][SPARKR] Specify commit sha1 ID when using install_github to ↵Sun Rui2016-03-231-1/+1
| | | | | | | | | | | | | | | install intr package. ## What changes were proposed in this pull request? In dev/lint-r.R, `install_github` makes our builds depend on a unstable source. This may cause un-expected test failures and then build break. This PR adds a specified commit sha1 ID to `install_github` to get a stable source. ## How was this patch tested? dev/lint-r Author: Sun Rui <rui.sun@intel.com> Closes #11913 from sun-rui/SPARK-14074.
* [SPARK-14011][CORE][SQL] Enable `LineLength` Java checkstyle ruleDongjoon Hyun2016-03-212-5/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ## What changes were proposed in this pull request? [Spark Coding Style Guide](https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide) has 100-character limit on lines, but it's disabled for Java since 11/09/15. This PR enables **LineLength** checkstyle again. To help that, this also introduces **RedundantImport** and **RedundantModifier**, too. The following is the diff on `checkstyle.xml`. ```xml - <!-- TODO: 11/09/15 disabled - the lengths are currently > 100 in many places --> - <!-- <module name="LineLength"> <property name="max" value="100"/> <property name="ignorePattern" value="^package.*|^import.*|a href|href|http://|https://|ftp://"/> </module> - --> <module name="NoLineWrap"/> <module name="EmptyBlock"> <property name="option" value="TEXT"/> -167,5 +164,7 </module> <module name="CommentsIndentation"/> <module name="UnusedImports"/> + <module name="RedundantImport"/> + <module name="RedundantModifier"/> ``` ## How was this patch tested? Currently, `lint-java` is disabled in Jenkins. It needs a manual test. After passing the Jenkins tests, `dev/lint-java` should passes locally. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11831 from dongjoon-hyun/SPARK-14011.
* [SPARK-13948] MiMa check should catch if the visibility changes to privateJosh Rosen2016-03-161-11/+7
| | | | | | | | | | MiMa excludes are currently generated using both the current Spark version's classes and Spark 1.2.0's classes, but this doesn't make sense: we should only be ignoring classes which were `private` in the previous Spark version, not classes which became private in the current version. This patch updates `dev/mima` to only generate excludes with respect to the previous artifacts that MiMa checks against. It also updates `MimaBuild` so that `excludeClass` only applies directly to the class being excluded and not to its companion object (since a class and its companion object can have different accessibility). Author: Josh Rosen <joshrosen@databricks.com> Closes #11774 from JoshRosen/SPARK-13948.
* [SPARK-13576][BUILD] Don't create assembly for examples.Marcelo Vanzin2016-03-151-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As part of the goal to stop creating assemblies in Spark, this change modifies the mvn and sbt builds to not create an assembly for examples. Instead, dependencies are copied to the build directory (under target/scala-xx/jars), and in the final archive, into the "examples/jars" directory. To avoid having to deal too much with Windows batch files, I made examples run through the launcher library; the spark-submit launcher now has a special mode to run examples, which adds all the necessary jars to the spark-submit command line, and replaces the bash and batch scripts that were used to run examples. The scripts are now just a thin wrapper around spark-submit; another advantage is that now all spark-submit options are supported. There are a few glitches; in the mvn build, a lot of duplicated dependencies get copied, because they are promoted to "compile" scope due to extra dependencies in the examples module (such as HBase). In the sbt build, all dependencies are copied, because there doesn't seem to be an easy way to filter things. I plan to clean some of this up when the rest of the tasks are finished. When the main assembly is replaced with jars, we can remove duplicate jars from the examples directory during packaging. Tested by running SparkPi in: maven build, sbt build, dist created by make-distribution.sh. Finally: note that running the "assembly" target in sbt doesn't build the examples anymore. You need to run "package" for that. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #11452 from vanzin/SPARK-13576.
* [SPARK-13843][STREAMING] Remove streaming-flume, streaming-mqtt, ↵Shixiong Zhu2016-03-143-89/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | streaming-zeromq, streaming-akka, streaming-twitter to Spark packages ## What changes were proposed in this pull request? Currently there are a few sub-projects, each for integrating with different external sources for Streaming. Now that we have better ability to include external libraries (spark packages) and with Spark 2.0 coming up, we can move the following projects out of Spark to https://github.com/spark-packages - streaming-flume - streaming-akka - streaming-mqtt - streaming-zeromq - streaming-twitter They are just some ancillary packages and considering the overhead of maintenance, running tests and PR failures, it's better to maintain them out of Spark. In addition, these projects can have their different release cycles and we can release them faster. I have already copied these projects to https://github.com/spark-packages ## How was this patch tested? Jenkins tests Author: Shixiong Zhu <shixiong@databricks.com> Closes #11672 from zsxwing/remove-external-pkg.