spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[SPARK-14134][CORE] Change the package name used for shading classes.	Marcelo Vanzin	2016-04-06	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current package name uses a dash, which is a little weird but seemed to work. That is, until a new test tried to mock a class that references one of those shaded types, and then things started failing. Most changes are just noise to fix the logging configs. For reference, SPARK-8815 also raised this issue, although at the time it did not cause any issues in Spark, so it was not addressed. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #11941 from vanzin/SPARK-14134.
*	[SPARK-13579][BUILD] Stop building the main Spark assembly.	Marcelo Vanzin	2016-04-04	1	-13/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change modifies the "assembly/" module to just copy needed dependencies to its build directory, and modifies the packaging script to pick those up (and remove duplicate jars packages in the examples module). I also made some minor adjustments to dependencies to remove some test jars from the final packaging, and remove jars that conflict with each other when packaged separately (e.g. servlet api). Also note that this change restores guava in applications' classpaths, even though it's still shaded inside Spark. This is now needed for the Hadoop libraries that are packaged with Spark, which now are not processed by the shade plugin. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #11796 from vanzin/SPARK-13579.
*	[SPARK-13825][CORE] Upgrade to Scala 2.11.8	Jacek Laskowski	2016-04-01	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Upgrade to 2.11.8 (from the current 2.11.7) ## How was this patch tested? A manual build Author: Jacek Laskowski <jacek@japila.pl> Closes #11681 from jaceklaskowski/SPARK-13825-scala-2_11_8.
*	[SPARK-14277][CORE] Upgrade Snappy Java to 1.1.2.4	Sital Kedia	2016-03-31	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Upgrade snappy to 1.1.2.4 to improve snappy read/write performance. ## How was this patch tested? Tested by running a job on the cluster and saw 7.5% cpu savings after this change. Author: Sital Kedia <skedia@fb.com> Closes #12096 from sitalkedia/snappyRelease.
*	[SPARK-14281][TESTS] Fix java8-tests and simplify their build	Josh Rosen	2016-03-31	1	-18/+13
\| \| \| \| \| \| \| \|	This patch fixes a compilation / build break in Spark's `java8-tests` and refactors their POM to simplify the build. See individual commit messages for more details. Author: Josh Rosen <joshrosen@databricks.com> Closes #12073 from JoshRosen/fix-java8-tests.
*	[SPARK-13710][SHELL][WINDOWS] Fix jline dependency on Windows	Michel Lemay	2016-03-31	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Exclude jline from curator-recipes since it conflicts with scala 2.11 when running spark-shell. Should not affect scala 2.10 since it is builtin. ## How was this patch tested? Ran spark-shell manually. Author: Michel Lemay <mlemay@gmail.com> Closes #12043 from michellemay/spark-13710-fix-jline-on-windows.
*	[SPARK-14211][SQL] Remove ANTLR3 based parser	Herman van Hovell	2016-03-31	1	-6/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	### What changes were proposed in this pull request? This PR removes the ANTLR3 based parser, and moves the new ANTLR4 based parser into the `org.apache.spark.sql.catalyst.parser package`. ### How was this patch tested? Existing unit tests. cc rxin andrewor14 yhuai Author: Herman van Hovell <hvanhovell@questtec.nl> Closes #12071 from hvanhovell/SPARK-14211.
*	[SPARK-13713][SQL][TEST-MAVEN] Add Antlr4 maven plugin.	Yin Huai	2016-03-28	1	-0/+5
\| \| \| \| \| \| \| \|	Seems https://github.com/apache/spark/commit/600c0b69cab4767e8e5a6f4284777d8b9d4bd40e is missing the antlr4 maven plugin. This pr adds it. Author: Yin Huai <yhuai@databricks.com> Closes #12010 from yhuai/mavenAntlr4.
*	[SPARK-13713][SQL] Migrate parser from ANTLR3 to ANTLR4	Herman van Hovell	2016-03-28	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	### What changes were proposed in this pull request? The current ANTLR3 parser is quite complex to maintain and suffers from code blow-ups. This PR introduces a new parser that is based on ANTLR4. This parser is based on the [Presto's SQL parser](https://github.com/facebook/presto/blob/master/presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4). The current implementation can parse and create Catalyst and SQL plans. Large parts of the HiveQl DDL and some of the DML functionality is currently missing, the plan is to add this in follow-up PRs. This PR is a work in progress, and work needs to be done in the following area's: - [x] Error handling should be improved. - [x] Documentation should be improved. - [x] Multi-Insert needs to be tested. - [ ] Naming and package locations. ### How was this patch tested? Catalyst and SQL unit tests. Author: Herman van Hovell <hvanhovell@questtec.nl> Closes #11557 from hvanhovell/ngParser.
*	[SPARK-14073][STREAMING][TEST-MAVEN] Move flume back to Spark	Shixiong Zhu	2016-03-25	1	-0/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This PR moves flume back to Spark as per the discussion in the dev mail-list. ## How was this patch tested? Existing Jenkins tests. Author: Shixiong Zhu <shixiong@databricks.com> Closes #11895 from zsxwing/move-flume-back.
*	[SPARK-13576][BUILD] Don't create assembly for examples.	Marcelo Vanzin	2016-03-15	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As part of the goal to stop creating assemblies in Spark, this change modifies the mvn and sbt builds to not create an assembly for examples. Instead, dependencies are copied to the build directory (under target/scala-xx/jars), and in the final archive, into the "examples/jars" directory. To avoid having to deal too much with Windows batch files, I made examples run through the launcher library; the spark-submit launcher now has a special mode to run examples, which adds all the necessary jars to the spark-submit command line, and replaces the bash and batch scripts that were used to run examples. The scripts are now just a thin wrapper around spark-submit; another advantage is that now all spark-submit options are supported. There are a few glitches; in the mvn build, a lot of duplicated dependencies get copied, because they are promoted to "compile" scope due to extra dependencies in the examples module (such as HBase). In the sbt build, all dependencies are copied, because there doesn't seem to be an easy way to filter things. I plan to clean some of this up when the rest of the tasks are finished. When the main assembly is replaced with jars, we can remove duplicate jars from the examples directory during packaging. Tested by running SparkPi in: maven build, sbt build, dist created by make-distribution.sh. Finally: note that running the "assembly" target in sbt doesn't build the examples anymore. You need to run "package" for that. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #11452 from vanzin/SPARK-13576.
*	[SPARK-13843][STREAMING] Remove streaming-flume, streaming-mqtt, ↵	Shixiong Zhu	2016-03-14	1	-86/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	streaming-zeromq, streaming-akka, streaming-twitter to Spark packages ## What changes were proposed in this pull request? Currently there are a few sub-projects, each for integrating with different external sources for Streaming. Now that we have better ability to include external libraries (spark packages) and with Spark 2.0 coming up, we can move the following projects out of Spark to https://github.com/spark-packages - streaming-flume - streaming-akka - streaming-mqtt - streaming-zeromq - streaming-twitter They are just some ancillary packages and considering the overhead of maintenance, running tests and PR failures, it's better to maintain them out of Spark. In addition, these projects can have their different release cycles and we can release them faster. I have already copied these projects to https://github.com/spark-packages ## How was this patch tested? Jenkins tests Author: Shixiong Zhu <shixiong@databricks.com> Closes #11672 from zsxwing/remove-external-pkg.
*	[SPARK-13663][CORE] Upgrade Snappy Java to 1.1.2.1	Sean Owen	2016-03-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Update snappy to 1.1.2.1 to pull in a single fix -- the OOM fix we already worked around. Supersedes https://github.com/apache/spark/pull/11524 ## How was this patch tested? Jenkins tests. Author: Sean Owen <sowen@cloudera.com> Closes #11631 from srowen/SPARK-13663.
*	[SPARK-13595][BUILD] Move docker, extras modules into external	Sean Owen	2016-03-09	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Move `docker` dirs out of top level into `external/`; move `extras/*` into `external/` ## How was this patch tested? This is tested with Jenkins tests. Author: Sean Owen <sowen@cloudera.com> Closes #11523 from srowen/SPARK-13595.
*	[SPARK-13715][MLLIB] Remove last usages of jblas in tests	Sean Owen	2016-03-08	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Remove last usage of jblas, in tests ## How was this patch tested? Jenkins tests -- the same ones that are being modified. Author: Sean Owen <sowen@cloudera.com> Closes #11560 from srowen/SPARK-13715.
*	[HOT-FIX][BUILD] Use the new location of `checkstyle-suppressions.xml`	Dongjoon Hyun	2016-03-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This PR fixes `dev/lint-java` and `mvn checkstyle:check` failures due the recent file location change. The following is the error message of current master. ``` Checkstyle checks failed at following occurrences: [ERROR] Failed to execute goal org.apache.maven.plugins:maven-checkstyle-plugin:2.17:check (default-cli) on project spark-parent_2.11: Failed during checkstyle configuration: cannot initialize module SuppressionFilter - Cannot set property 'file' to 'checkstyle-suppressions.xml' in module SuppressionFilter: InvocationTargetException: Unable to find: checkstyle-suppressions.xml -> [Help 1] ``` ## How was this patch tested? Manual. The following command should run correctly. ``` ./dev/lint-java mvn checkstyle:check ``` Author: Dongjoon Hyun <dongjoon@apache.org> Closes #11567 from dongjoon-hyun/hotfix_checkstyle_suppression.
*	[SPARK-13596][BUILD] Move misc top-level build files into appropriate subdirs	Sean Owen	2016-03-07	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Move many top-level files in dev/ or other appropriate directory. In particular, put `make-distribution.sh` in `dev` and update docs accordingly. Remove deprecated `sbt/sbt`. I was (so far) unable to figure out how to move `tox.ini`. `scalastyle-config.xml` should be movable but edits to the project `.sbt` files didn't work; config file location is updatable for compile but not test scope. ## How was this patch tested? `./dev/run-tests` to verify RAT and checkstyle work. Jenkins tests for the rest. Author: Sean Owen <sowen@cloudera.com> Closes #11522 from srowen/SPARK-13596.
*	[SPARK-13599][BUILD] remove transitive groovy dependencies from Hive	Steve Loughran	2016-03-03	1	-0/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Modifies the dependency declarations of the all the hive artifacts, to explicitly exclude the groovy-all JAR. This stops the groovy classes and everything else in that uber-JAR from getting into spark-assembly JAR. ## How was this patch tested? 1. Pre-patch build was made: `mvn clean install -Pyarn,hive,hive-thriftserver` 1. spark-assembly expanded, observed to have the org.codehaus.groovy packages and JARs 1. A maven dependency tree was created `mvn dependency:tree -Pyarn,hive,hive-thriftserver -Dverbose > target/dependencies.txt` 1. This text file examined to confirm that groovy was being imported as a dependency of `org.spark-project.hive` 1. Patch applied 1. Repeated step1: clean build of project with ` -Pyarn,hive,hive-thriftserver` set 1. Examined created spark-assembly, verified no org.codehaus packages 1. Verified that the maven dependency tree no longer references groovy Note also that the size of the assembly JAR was 181628646 bytes before this patch, 166318515 after —15MB smaller. That's a good metric of things being excluded Author: Steve Loughran <stevel@hortonworks.com> Closes #11449 from steveloughran/fixes/SPARK-13599-groovy-dependency.
*	[SPARK-13548][BUILD] Move tags and unsafe modules into common	Reynold Xin	2016-03-01	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This patch moves tags and unsafe modules into common directory to remove 2 top level non-user-facing directories. ## How was this patch tested? Jenkins should suffice. Author: Reynold Xin <rxin@databricks.com> Closes #11426 from rxin/SPARK-13548.
*	[SPARK-13529][BUILD] Move network/* modules into common/network-*	Reynold Xin	2016-02-28	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? As the title says, this moves the three modules currently in network/ into common/network-*. This removes one top level, non-user-facing folder. ## How was this patch tested? Compilation and existing tests. We should run both SBT and Maven. Author: Reynold Xin <rxin@databricks.com> Closes #11409 from rxin/SPARK-13529.
*	[SPARK-7483][MLLIB] Upgrade Chill to 0.7.2 to support Kryo with FPGrowth	mark800	2016-02-27	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	It registers more Scala classes, including ListBuffer to support Kryo with FPGrowth. See https://github.com/twitter/chill/releases for Chill's change log. Author: mark800 <yky800@126.com> Closes #11041 from mark800/master.
*	[SPARK-13324][CORE][BUILD] Update plugin, test, example dependencies for 2.x	Sean Owen	2016-02-17	1	-17/+17
\| \| \| \| \| \| \| \|	Phase 1: update plugin versions, test dependencies, some example and third-party versions Author: Sean Owen <sowen@cloudera.com> Closes #11206 from srowen/SPARK-13324.
*	[SPARK-13189] Cleanup build references to Scala 2.10	Luciano Resende	2016-02-09	1	-1/+1
\| \| \| \| \| \|	Author: Luciano Resende <lresende@apache.org> Closes #11092 from lresende/SPARK-13189.
*	[SPARK-6363][BUILD] Make Scala 2.11 the default Scala version	Josh Rosen	2016-01-30	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \|	This patch changes Spark's build to make Scala 2.11 the default Scala version. To be clear, this does not mean that Spark will stop supporting Scala 2.10: users will still be able to compile Spark for Scala 2.10 by following the instructions on the "Building Spark" page; however, it does mean that Scala 2.11 will be the default Scala version used by our CI builds (including pull request builds). The Scala 2.11 compiler is faster than 2.10, so I think we'll be able to look forward to a slight speedup in our CI builds (it looks like it's about 2X faster for the Maven compile-only builds, for instance). After this patch is merged, I'll update Jenkins to add new compile-only jobs to ensure that Scala 2.10 compilation doesn't break. Author: Josh Rosen <joshrosen@databricks.com> Closes #10608 from JoshRosen/SPARK-6363.
*	[SPARK-12933][SQL] Initial implementation of Count-Min sketch	Cheng Lian	2016-01-23	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This PR adds an initial implementation of count min sketch, contained in a new module spark-sketch under `common/sketch`. The implementation is based on the [`CountMinSketch` class in stream-lib][1]. As required by the [design doc][2], spark-sketch should have no external dependency. Two classes, `Murmur3_x86_32` and `Platform` are copied to spark-sketch from spark-unsafe for hashing facilities. They'll also be used in the upcoming bloom filter implementation. The following features will be added in future follow-up PRs: - Serialization support - DataFrame API integration [1]: https://github.com/addthis/stream-lib/blob/aac6b4d23a8686b000f80baa447e0922ecac3bcb/src/main/java/com/clearspring/analytics/stream/frequency/CountMinSketch.java [2]: https://issues.apache.org/jira/secure/attachment/12782378/BloomFilterandCount-MinSketchinSpark2.0.pdf Author: Cheng Lian <lian@databricks.com> Closes #10851 from liancheng/count-min-sketch.
*	[SPARK-7997][CORE] Remove Akka from Spark Core and Streaming	Shixiong Zhu	2016-01-22	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \|	- Remove Akka dependency from core. Note: the streaming-akka project still uses Akka. - Remove HttpFileServer - Remove Akka configs from SparkConf and SSLOptions - Rename `spark.akka.frameSize` to `spark.rpc.message.maxSize`. I think it's still worth to keep this config because using `DirectTaskResult` or `IndirectTaskResult` depends on it. - Update comments and docs Author: Shixiong Zhu <shixiong@databricks.com> Closes #10854 from zsxwing/remove-akka.
*	[SPARK-7799][SPARK-12786][STREAMING] Add "streaming-akka" project	Shixiong Zhu	2016-01-20	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Include the following changes: 1. Add "streaming-akka" project and org.apache.spark.streaming.akka.AkkaUtils for creating an actorStream 2. Remove "StreamingContext.actorStream" and "JavaStreamingContext.actorStream" 3. Update the ActorWordCount example and add the JavaActorWordCount example 4. Make "streaming-zeromq" depend on "streaming-akka" and update the codes accordingly Author: Shixiong Zhu <shixiong@databricks.com> Closes #10744 from zsxwing/streaming-akka-2.
*	[SPARK-12842][TEST-HADOOP2.7] Add Hadoop 2.7 build profile	Josh Rosen	2016-01-15	1	-0/+10
\| \| \| \| \| \| \| \| \| \|	This patch adds a Hadoop 2.7 build profile in order to let us automate tests against that version. /cc rxin srowen Author: Josh Rosen <joshrosen@databricks.com> Closes #10775 from JoshRosen/add-hadoop-2.7-profile.
*	[SPARK-12269][STREAMING][KINESIS] Update aws-java-sdk version	BrianLondon	2016-01-11	1	-3/+3
\| \| \| \| \| \| \| \|	The current Spark Streaming kinesis connector references a quite old version 1.9.40 of the AWS Java SDK (1.10.40 is current). Numerous AWS features including Kinesis Firehose are unavailable in 1.9. Those two versions of the AWS SDK in turn require conflicting versions of Jackson (2.4.4 and 2.5.3 respectively) such that one cannot include the current AWS SDK in a project that also uses the Spark Streaming Kinesis ASL. Author: BrianLondon <brian@seatgeek.com> Closes #10256 from BrianLondon/master.
*	[SPARK-12734][HOTFIX][TEST-MAVEN] Fix bug in Netty exclusions	Josh Rosen	2016-01-11	1	-43/+7
\| \| \| \| \| \| \| \| \| \|	This is a hotfix for a build bug introduced by the Netty exclusion changes in #10672. We can't exclude `io.netty:netty` because Akka depends on it. There's not a direct conflict between `io.netty:netty` and `io.netty:netty-all`, because the former puts classes in the `org.jboss.netty` namespace while the latter uses the `io.netty` namespace. However, there still is a conflict between `org.jboss.netty:netty` and `io.netty:netty`, so we need to continue to exclude the JBoss version of that artifact. While the diff here looks somewhat large, note that this is only a revert of a some of the changes from #10672. You can see the net changes in pom.xml at https://github.com/apache/spark/compare/3119206b7188c23055621dfeaf6874f21c711a82...5211ab8#diff-600376dffeb79835ede4a0b285078036 Author: Josh Rosen <joshrosen@databricks.com> Closes #10693 from JoshRosen/netty-hotfix.
*	[SPARK-12734][BUILD] Fix Netty exclusion and use Maven Enforcer to prevent ↵	Josh Rosen	2016-01-10	1	-1/+56
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	future bugs Netty classes are published under multiple artifacts with different names, so our build needs to exclude the `io.netty:netty` and `org.jboss.netty:netty` versions of the Netty artifact. However, our existing exclusions were incomplete, leading to situations where duplicate Netty classes would wind up on the classpath and cause compile errors (or worse). This patch fixes the exclusion issue by adding more exclusions and uses Maven Enforcer's [banned dependencies](https://maven.apache.org/enforcer/enforcer-rules/bannedDependencies.html) rule to prevent these classes from accidentally being reintroduced. I also updated `dev/test-dependencies.sh` to run `mvn validate` so that the enforcer rules can run as part of pull request builds. /cc rxin srowen pwendell. I'd like to backport at least the exclusion portion of this fix to `branch-1.5` in order to fix the documentation publishing job, which fails nondeterministically due to incompatible versions of Netty classes taking precedence on the compile-time classpath. Author: Josh Rosen <rosenville@gmail.com> Author: Josh Rosen <joshrosen@databricks.com> Closes #10672 from JoshRosen/enforce-netty-exclusions.
*	[SPARK-4628][BUILD] Remove all non-Maven-Central repositories from build	Josh Rosen	2016-01-08	1	-87/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch removes all non-Maven-central repositories from Spark's build, thereby avoiding any risk of future build-breaks due to us accidentally depending on an artifact which is not present in an immutable public Maven repository. I tested this by running ``` build/mvn \ -Phive \ -Phive-thriftserver \ -Pkinesis-asl \ -Pspark-ganglia-lgpl \ -Pyarn \ dependency:go-offline ``` inside of a fresh Ubuntu Docker container with no Ivy or Maven caches (I did a similar test for SBT). Author: Josh Rosen <joshrosen@databricks.com> Closes #10659 from JoshRosen/SPARK-4628.
*	[SPARK-4819] Remove Guava's "Optional" from public API	Sean Owen	2016-01-08	1	-11/+0
\| \| \| \| \| \| \| \| \| \|	Replace Guava `Optional` with (an API clone of) Java 8 `java.util.Optional` (edit: and a clone of Guava `Optional`) See also https://github.com/apache/spark/pull/10512 Author: Sean Owen <sowen@cloudera.com> Closes #10513 from srowen/SPARK-4819.
*	[SPARK-12573][SPARK-12574][SQL] Move SQL Parser from Hive to Catalyst	Herman van Hovell	2016-01-06	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This PR moves a major part of the new SQL parser to Catalyst. This is a prelude to start using this parser for all of our SQL parsing. The following key changes have been made: The ANTLR Parser & Supporting classes have been moved to the Catalyst project. They are now part of the ```org.apache.spark.sql.catalyst.parser``` package. These classes contained quite a bit of code that was originally from the Hive project, I have added aknowledgements whenever this applied. All Hive dependencies have been factored out. I have also taken this chance to clean-up the ```ASTNode``` class, and to improve the error handling. The HiveQl object that provides the functionality to convert an AST into a LogicalPlan has been refactored into three different classes, one for every SQL sub-project: - ```CatalystQl```: This implements Query and Expression parsing functionality. - ```SparkQl```: This is a subclass of CatalystQL and provides SQL/Core only functionality such as Explain and Describe. - ```HiveQl```: This is a subclass of ```SparkQl``` and this adds Hive-only functionality to the parser such as Analyze, Drop, Views, CTAS & Transforms. This class still depends on Hive. cc rxin Author: Herman van Hovell <hvanhovell@questtec.nl> Closes #10583 from hvanhovell/SPARK-12575.
*	[SPARK-12453][STREAMING] Remove explicit dependency on aws-java-sdk	BrianLondon	2016-01-05	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \|	Successfully ran kinesis demo on a live, aws hosted kinesis stream against master and 1.6 branches. For reasons I don't entirely understand it required a manual merge to 1.5 which I did as shown here: https://github.com/BrianLondon/spark/commit/075c22e89bc99d5e99be21f40e0d72154a1e23a2 The demo ran successfully on the 1.5 branch as well. According to `mvn dependency:tree` it is still pulling a fairly old version of the aws-java-sdk (1.9.37), but this appears to have fixed the kinesis regression in 1.5.2. Author: BrianLondon <brian@seatgeek.com> Closes #10492 from BrianLondon/remove-only.
*	[SPARK-12362][SQL][WIP] Inline Hive Parser	Herman van Hovell	2016-01-01	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This PR inlines the Hive SQL parser in Spark SQL. The previous (merged) incarnation of this PR passed all tests, but had and still has problems with the build. These problems are caused by a the fact that - for some reason - in some cases the ANTLR generated code is not included in the compilation fase. This PR is a WIP and should not be merged until we have sorted out the build issues. Author: Herman van Hovell <hvanhovell@questtec.nl> Author: Nong Li <nong@databricks.com> Author: Nong Li <nongli@gmail.com> Closes #10525 from hvanhovell/SPARK-12362.
*	[SPARK-10359] Enumerate dependencies in a file and diff against it for new ↵	Josh Rosen	2015-12-30	1	-0/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	pull requests This patch adds a new build check which enumerates Spark's resolved runtime classpath and saves it to a file, then diffs against that file to detect whether pull requests have introduced dependency changes. The aim of this check is to make it simpler to reason about whether pull request which modify the build have introduced new dependencies or changed transitive dependencies in a way that affects the final classpath. This supplants the checks added in SPARK-4123 / #5093, which are currently disabled due to bugs. This patch is based on pwendell's work in #8531. Closes #8531. Author: Josh Rosen <joshrosen@databricks.com> Author: Patrick Wendell <patrick@databricks.com> Closes #10461 from JoshRosen/SPARK-10359.
*	Revert "[SPARK-12362][SQL][WIP] Inline Hive Parser"	Reynold Xin	2015-12-30	1	-5/+0
\| \| \| \|	This reverts commit b600bccf41a7b1958e33d8301a19214e6517e388 due to non-deterministic build breaks.
*	[SPARK-12362][SQL][WIP] Inline Hive Parser	Nong Li	2015-12-29	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a WIP. The PR has been taken over from nongli (see https://github.com/apache/spark/pull/10420). I have removed some additional dead code, and fixed a few issues which were caused by the fact that the inlined Hive parser is newer than the Hive parser we currently use in Spark. I am submitting this PR in order to get some feedback and testing done. There is quite a bit of work to do: - [ ] Get it to pass jenkins build/test. - [ ] Aknowledge Hive-project for using their parser. - [ ] Refactorings between HiveQl and the java classes. - [ ] Create our own ASTNode and integrate the current implicit extentions. - [ ] Move remaining ```SemanticAnalyzer``` and ```ParseUtils``` functionality to ```HiveQl```. - [ ] Removing Hive dependencies from the parser. This will require some edits in the grammar files. - [ ] Introduce our own context which needs to contain a ```TokenRewriteStream```. - [ ] Add ```useSQL11ReservedKeywordsForIdentifier``` and ```allowQuotedId``` to the catalyst or sql configuration. - [ ] Remove ```HiveConf``` from grammar files &HiveQl, and pass in our own configuration. - [ ] Moving the parser into sql/core. cc nongli rxin Author: Herman van Hovell <hvanhovell@questtec.nl> Author: Nong Li <nong@databricks.com> Author: Nong Li <nongli@gmail.com> Closes #10509 from hvanhovell/SPARK-12362.
*	[SPARK-11807] Remove support for Hadoop < 2.2	Reynold Xin	2015-12-21	1	-13/+0
\| \| \| \| \| \| \| \|	i.e. Hadoop 1 and Hadoop 2.0 Author: Reynold Xin <rxin@databricks.com> Closes #10404 from rxin/SPARK-11807.
*	[SPARK-11808] Remove Bagel.	Reynold Xin	2015-12-19	1	-2/+1
\| \| \| \| \| \|	Author: Reynold Xin <rxin@databricks.com> Closes #10395 from rxin/SPARK-11808.
*	Bump master version to 2.0.0-SNAPSHOT.	Reynold Xin	2015-12-19	1	-1/+1
\| \| \| \| \| \|	Author: Reynold Xin <rxin@databricks.com> Closes #10387 from rxin/version-bump.
*	[SPARK-11796] Fix httpclient and httpcore depedency issues related to ↵	Mark Grover	2015-12-09	1	-0/+28
\| \| \| \| \| \| \| \| \| \|	docker-client This commit fixes dependency issues which prevented the Docker-based JDBC integration tests from running in the Maven build. Author: Mark Grover <mgrover@cloudera.com> Closes #9876 from markgrover/master_docker.
*	[SPARK-11652][CORE] Remote code execution with InvokerTransformer	Sean Owen	2015-12-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Fix commons-collection group ID to commons-collections for version 3.x Patches earlier PR at https://github.com/apache/spark/pull/9731 Author: Sean Owen <sowen@cloudera.com> Closes #10198 from srowen/SPARK-11652.2.
*	[SPARK-12112][BUILD] Upgrade to SBT 0.13.9	Josh Rosen	2015-12-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	We should upgrade to SBT 0.13.9, since this is a requirement in order to use SBT's new Maven-style resolution features (which will be done in a separate patch, because it's blocked by some binary compatibility issues in the POM reader plugin). I also upgraded Scalastyle to version 0.8.0, which was necessary in order to fix a Scala 2.10.5 compatibility issue (see https://github.com/scalastyle/scalastyle/issues/156). The newer Scalastyle is slightly stricter about whitespace surrounding tokens, so I fixed the new style violations. Author: Josh Rosen <joshrosen@databricks.com> Closes #10112 from JoshRosen/upgrade-to-sbt-0.13.9.
*	[SPARK-6990][BUILD] Add Java linting script; fix minor warnings	Dmitry Erastov	2015-12-04	1	-0/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This replaces https://github.com/apache/spark/pull/9696 Invoke Checkstyle and print any errors to the console, failing the step. Use Google's style rules modified according to https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide Some important checks are disabled (see TODOs in `checkstyle.xml`) due to multiple violations being present in the codebase. Suggest fixing those TODOs in a separate PR(s). More on Checkstyle can be found on the [official website](http://checkstyle.sourceforge.net/). Sample output (from [build 46345](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46345/consoleFull)) (duplicated because I run the build twice with different profiles): > Checkstyle checks failed at following occurrences: [ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/UnsafeRowParquetRecordReader.java:[217,7] (coding) MissingSwitchDefault: switch without "default" clause. > [ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java:[198,10] (modifier) ModifierOrder: 'protected' modifier out of order with the JLS suggestions. > [ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/UnsafeRowParquetRecordReader.java:[217,7] (coding) MissingSwitchDefault: switch without "default" clause. > [ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java:[198,10] (modifier) ModifierOrder: 'protected' modifier out of order with the JLS suggestions. > [error] running /home/jenkins/workspace/SparkPullRequestBuilder2/dev/lint-java ; received return code 1 Also fix some of the minor violations that didn't require sweeping changes. Apologies for the previous botched PRs - I finally figured out the issue. cr: JoshRosen, pwendell > I state that the contribution is my original work, and I license the work to the project under the project's open source license. Author: Dmitry Erastov <derastov@gmail.com> Closes #9867 from dskrvk/master.
*	[SPARK-4424] Remove spark.driver.allowMultipleContexts override in tests	Josh Rosen	2015-11-23	1	-2/+0
\| \| \| \| \| \| \| \|	This patch removes `spark.driver.allowMultipleContexts=true` from our test configuration. The multiple SparkContexts check was originally disabled because certain tests suites in SQL needed to create multiple contexts. As far as I know, this configuration change is no longer necessary, so we should remove it in order to make it easier to find test cleanup bugs. Author: Josh Rosen <joshrosen@databricks.com> Closes #9865 from JoshRosen/SPARK-4424.
*	[SPARK-11652][CORE] Remote code execution with InvokerTransformer	Sean Owen	2015-11-18	1	-0/+7
\| \| \| \| \| \| \| \|	Update to Commons Collections 3.2.2 to avoid any potential remote code execution vulnerability Author: Sean Owen <sowen@cloudera.com> Closes #9731 from srowen/SPARK-11652.
*	[SPARK-11583] [CORE] MapStatus Using RoaringBitmap More Properly	Kent Yao	2015-11-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	This PR upgrade the version of RoaringBitmap to 0.5.10, to optimize the memory layout, will be much smaller when most of blocks are empty. This PR is based on #9661 (fix conflicts), see all of the comments at https://github.com/apache/spark/pull/9661 . Author: Kent Yao <yaooqinn@hotmail.com> Author: Davies Liu <davies@databricks.com> Author: Charles Allen <charles@allen-net.com> Closes #9746 from davies/roaring_mapstatus.
*	Revert "[SPARK-11271][SPARK-11016][CORE] Use Spark BitSet instead of ↵	Davies Liu	2015-11-16	1	-0/+5
\| \| \| \| \| \|	RoaringBitmap to reduce memory usage" This reverts commit e209fa271ae57dc8849f8b1241bf1ea7d6d3d62c.