spark - Mirror of Apache Spark

	Commit message (Collapse)	Author	Age	Files	Lines
*	[SPARK-17229][SQL] PostgresDialect shouldn't widen float and short types ↵	Josh Rosen	2016-08-25	1	-4/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	during reads ## What changes were proposed in this pull request? When reading float4 and smallint columns from PostgreSQL, Spark's `PostgresDialect` widens these types to Decimal and Integer rather than using the narrower Float and Short types. According to https://www.postgresql.org/docs/7.1/static/datatype.html#DATATYPE-TABLE, Postgres maps the `smallint` type to a signed two-byte integer and the `real` / `float4` types to single precision floating point numbers. This patch fixes this by adding more special-cases to `getCatalystType`, similar to what was done for the Derby JDBC dialect. I also fixed a similar problem in the write path which causes Spark to create integer columns in Postgres for what should have been ShortType columns. ## How was this patch tested? New test cases in `PostgresIntegrationSuite` (which I ran manually because Jenkins can't run it right now). Author: Josh Rosen <joshrosen@databricks.com> Closes #14796 from JoshRosen/postgres-jdbc-type-fixes.
*	[SPARK-17023][BUILD] Upgrade to Kafka 0.10.0.1 release	Luciano Resende	2016-08-13	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Update Kafka streaming connector to use Kafka 0.10.0.1 release ## How was this patch tested? Tested via Spark unit and integration tests Author: Luciano Resende <lresende@apache.org> Closes #14606 from lresende/kafka-upgrade.
*	[SPARK-16950] [PYSPARK] fromOffsets parameter support in ↵	Mariusz Strzelecki	2016-08-09	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	KafkaUtils.createDirectStream for python3 ## What changes were proposed in this pull request? Ability to use KafkaUtils.createDirectStream with starting offsets in python 3 by using java.lang.Number instead of Long during param mapping in scala helper. This allows py4j to pass Integer or Long to the map and resolves ClassCastException problems. ## How was this patch tested? unit tests jerryshao - could you please look at this PR? Author: Mariusz Strzelecki <mariusz.strzelecki@allegrogroup.com> Closes #14540 from szczeles/kafka_pyspark.
*	[SPARK-16779][TRIVIAL] Avoid using postfix operators where they do not add ↵	Holden Karau	2016-08-08	2	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	much and remove whitelisting ## What changes were proposed in this pull request? Avoid using postfix operation for command execution in SQLQuerySuite where it wasn't whitelisted and audit existing whitelistings removing postfix operators from most places. Some notable places where postfix operation remains is in the XML parsing & time units (seconds, millis, etc.) where it arguably can improve readability. ## How was this patch tested? Existing tests. Author: Holden Karau <holden@us.ibm.com> Closes #14407 from holdenk/SPARK-16779.
*	[SPARK-13238][CORE] Add ganglia dmax parameter	Ekasit Kijsipongse	2016-08-05	1	-0/+5
\| \| \| \| \| \| \| \|	The current ganglia reporter doesn't set metric expiration time (dmax). The metrics of all finished applications are indefinitely left displayed in ganglia web. The dmax parameter allows user to set the lifetime of the metrics. The default value is 0 for compatibility with previous versions. Author: Ekasit Kijsipongse <ekasitk@gmail.com> Closes #11127 from ekasitk/ganglia-dmax.
*	[SPARK-16625][SQL] General data types to be mapped to Oracle	Yuming Wang	2016-08-05	1	-1/+73
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Spark will convert BooleanType to BIT(1), LongType to BIGINT, ByteType to BYTE when saving DataFrame to Oracle, but Oracle does not support BIT, BIGINT and BYTE types. This PR is convert following _Spark Types_ to _Oracle types_ refer to [Oracle Developer's Guide](https://docs.oracle.com/cd/E19501-01/819-3659/gcmaz/) Spark Type \| Oracle ----\|---- BooleanType \| NUMBER(1) IntegerType \| NUMBER(10) LongType \| NUMBER(19) FloatType \| NUMBER(19, 4) DoubleType \| NUMBER(19, 4) ByteType \| NUMBER(3) ShortType \| NUMBER(5) ## How was this patch tested? Add new tests in [JDBCSuite.scala](https://github.com/wangyum/spark/commit/22b0c2a4228cb8b5098ad741ddf4d1904e745ff6#diff-dc4b58851b084b274df6fe6b189db84d) and [OracleDialect.scala](https://github.com/wangyum/spark/commit/22b0c2a4228cb8b5098ad741ddf4d1904e745ff6#diff-5e0cadf526662f9281aa26315b3750ad) Author: Yuming Wang <wgyumg@gmail.com> Closes #14377 from wangyum/SPARK-16625.
*	[SPARK-16776][STREAMING] Replace deprecated API in KafkaTestUtils for 0.10.0.	hyukjinkwon	2016-08-01	1	-8/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This PR replaces the old Kafka API to 0.10.0 ones in `KafkaTestUtils`. The change include: - `Producer` to `KafkaProducer` - Change configurations to equalvant ones. (I referred [here](http://kafka.apache.org/documentation.html#producerconfigs) for 0.10.0 and [here](http://kafka.apache.org/082/documentation.html#producerconfigs ) for old, 0.8.2). This PR will remove the build warning as below: ```scala [WARNING] .../spark/external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaTestUtils.scala:71: class Producer in package producer is deprecated: This class has been deprecated and will be removed in a future release. Please use org.apache.kafka.clients.producer.KafkaProducer instead. [WARNING] private var producer: Producer[String, String] = _ [WARNING] ^ [WARNING] .../spark/external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaTestUtils.scala:181: class Producer in package producer is deprecated: This class has been deprecated and will be removed in a future release. Please use org.apache.kafka.clients.producer.KafkaProducer instead. [WARNING] producer = new Producer[String, String](new ProducerConfig(producerConfiguration)) [WARNING] ^ [WARNING] .../spark/streaming/kafka010/KafkaTestUtils.scala:181: class ProducerConfig in package producer is deprecated: This class has been deprecated and will be removed in a future release. Please use org.apache.kafka.clients.producer.ProducerConfig instead. [WARNING] producer = new Producer[String, String](new ProducerConfig(producerConfiguration)) [WARNING] ^ [WARNING] .../spark/external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaTestUtils.scala:182: class KeyedMessage in package producer is deprecated: This class has been deprecated and will be removed in a future release. Please use org.apache.kafka.clients.producer.ProducerRecord instead. [WARNING] producer.send(messages.map { new KeyedMessage[String, String](topic, _ ) }: _*) [WARNING] ^ [WARNING] four warnings found [WARNING] warning: [options] bootstrap class path not set in conjunction with -source 1.7 [WARNING] 1 warning ``` ## How was this patch tested? Existing tests that use `KafkaTestUtils` should cover this. Author: hyukjinkwon <gurwls223@gmail.com> Closes #14416 from HyukjinKwon/SPARK-16776.
*	[TEST][STREAMING] Fix flaky Kafka rate controlling test	Tathagata Das	2016-07-26	1	-5/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? The current test is incorrect, because - The expected number of messages does not take into account that the topic has 2 partitions, and rate is set per partition. - Also in some cases, the test ran out of data in Kafka while waiting for the right amount of data per batch. The PR - Reduces the number of partitions to 1 - Adds more data to Kafka - Runs with 0.5 second so that batches are created slowly ## How was this patch tested? Ran many times locally, going to run it many times in Jenkins (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #14361 from tdas/kafka-rate-test-fix.
*	[SPARK-16535][BUILD] In pom.xml, remove groupId which is redundant ↵	Xin Ren	2016-07-19	11	-11/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	definition and inherited from the parent https://issues.apache.org/jira/browse/SPARK-16535 ## What changes were proposed in this pull request? When I scan through the pom.xml of sub projects, I found this warning as below and attached screenshot ``` Definition of groupId is redundant, because it's inherited from the parent ``` ![screen shot 2016-07-13 at 3 13 11 pm](https://cloud.githubusercontent.com/assets/3925641/16823121/744f893e-4916-11e6-8a52-042f83b9db4e.png) I've tried to remove some of the lines with groupId definition, and the build on my local machine is still ok. ``` <groupId>org.apache.spark</groupId> ``` As I just find now `<maven.version>3.3.9</maven.version>` is being used in Spark 2.x, and Maven-3 supports versionless parent elements: Maven 3 will remove the need to specify the parent version in sub modules. THIS is great (in Maven 3.1). ref: http://stackoverflow.com/questions/3157240/maven-3-worth-it/3166762#3166762 ## How was this patch tested? I've tested by re-building the project, and build succeeded. Author: Xin Ren <iamshrek@126.com> Closes #14189 from keypointt/SPARK-16535.
*	[SPARK-16477] Bump master version to 2.1.0-SNAPSHOT	Reynold Xin	2016-07-11	12	-12/+12
\| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? After SPARK-16476 (committed earlier today as #14128), we can finally bump the version number. ## How was this patch tested? N/A Author: Reynold Xin <rxin@databricks.com> Closes #14130 from rxin/SPARK-16477.
*	[SPARK-13569][STREAMING][KAFKA] pattern based topic subscription	cody koeninger	2016-07-08	3	-9/+258
\| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Allow for kafka topic subscriptions based on a regex pattern. ## How was this patch tested? Unit tests, manual tests Author: cody koeninger <cody@koeninger.org> Closes #14026 from koeninger/SPARK-13569.
*	[SPARK-16212][STREAMING][KAFKA] apply test tweaks from 0-10 to 0-8 as well	cody koeninger	2016-07-06	2	-25/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Bring the kafka-0-8 subproject up to date with some test modifications from development on 0-10. Main changes are - eliminating waits on concurrent queue in favor of an assert on received results, - atomics instead of volatile (although this probably doesn't matter) - increasing uniqueness of topic names ## How was this patch tested? Unit tests Author: cody koeninger <cody@koeninger.org> Closes #14073 from koeninger/kafka-0-8-test-direct-cleanup.
*	[SPARK-16212][STREAMING][KAFKA] use random port for embedded kafka	cody koeninger	2016-07-05	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Testing for 0.10 uncovered an issue with a fixed port number being used in KafkaTestUtils. This is making a roughly equivalent fix for the 0.8 connector ## How was this patch tested? Unit tests, manual tests Author: cody koeninger <cody@koeninger.org> Closes #14018 from koeninger/kafka-0-8-test-port.
*	[SPARK-12177][STREAMING][KAFKA] limit api surface area	cody koeninger	2016-07-01	13	-193/+222
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This is an alternative to the refactoring proposed by https://github.com/apache/spark/pull/13996 ## How was this patch tested? unit tests also tested under scala 2.10 via mvn -Dscala-2.10 Author: cody koeninger <cody@koeninger.org> Closes #13998 from koeninger/kafka-0-10-refactor.
*	[SPARK-16212][STREAMING][KAFKA] code cleanup from review feedback	cody koeninger	2016-06-30	3	-14/+12
\| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? code cleanup in kafka-0-8 to match suggested changes for kafka-0-10 branch ## How was this patch tested? unit tests Author: cody koeninger <cody@koeninger.org> Closes #13908 from koeninger/kafka-0-8-cleanup.
*	[SPARK-12177][TEST] Removed test to avoid compilation issue in scala 2.10	Tathagata Das	2016-06-30	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? The commented lines failed scala 2.10 build. This is because of change in behavior of case classes between 2.10 and 2.11. In scala 2.10, if companion object of a case class has explicitly defined apply(), then the implicit apply method is not generated. In scala 2.11 it is generated. Hence, the lines compile fine in 2.11 but not in 2.10. This simply comments the tests to fix broken build. Correct solution is pending. Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #13992 from tdas/SPARK-12177.
*	[SPARK-12177][STREAMING][KAFKA] Update KafkaDStreams to new Kafka 0.10 ↵	cody koeninger	2016-06-29	20	-0/+3351
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Consumer API ## What changes were proposed in this pull request? New Kafka consumer api for the released 0.10 version of Kafka ## How was this patch tested? Unit tests, manual tests Author: cody koeninger <cody@koeninger.org> Closes #11863 from koeninger/kafka-0.9.
*	[SPARK-15086][CORE][STREAMING] Deprecate old Java accumulator API	Sean Owen	2016-06-12	2	-67/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? - Deprecate old Java accumulator API; should use Scala now - Update Java tests and examples - Don't bother testing old accumulator API in Java 8 (too) - (fix a misspelling too) ## How was this patch tested? Jenkins tests Author: Sean Owen <sowen@cloudera.com> Closes #13606 from srowen/SPARK-15086.
*	[MINOR] Fix Typos 'an -> a'	Zheng RuiFeng	2016-06-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? `an -> a` Use cmds like `find . -name '*.R' \| xargs -i sh -c "grep -in ' an [^aeiou]' {} && echo {}"` to generate candidates, and review them one by one. ## How was this patch tested? manual tests Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes #13515 from zhengruifeng/an_a.
*	[SPARK-15451][BUILD] Use jdk7's rt.jar when available.	Marcelo Vanzin	2016-05-31	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This helps with preventing jdk8-specific calls being checked in, because PR builders are running the compiler with the wrong settings. If the JAVA_7_HOME env variable is set, assume it points at a jdk7 and use its rt.jar when invoking javac. For zinc, just run it with jdk7, and disable it when building jdk8-specific code. A big note for sbt usage: adding the bootstrap options forces sbt to fork the compiler, and that disables incremental compilation. That means that it's really not convenient to use for normal development, but should be ok for automated builds. Tested with JAVA_HOME=jdk8 and JAVA_7_HOME=jdk7: - mvn + zinc - mvn sans zinc - sbt Verified that in all cases, jdk8-specific library calls fail to compile. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #13272 from vanzin/SPARK-15451.
*	[SPARK-15633][MINOR] Make package name for Java tests consistent	Reynold Xin	2016-05-27	4	-8/+15
\| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This is a simple patch that makes package names for Java 8 test suites consistent. I moved everything to test.org.apache.spark to we can test package private APIs properly. Also added "java8" as the package name so we can easily run all the tests related to Java 8. ## How was this patch tested? This is a test only change. Author: Reynold Xin <rxin@databricks.com> Closes #13364 from rxin/SPARK-15633.
*	[SPARK-15508][STREAMING][TESTS] Fix flaky test: ↵	Shixiong Zhu	2016-05-24	1	-6/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	JavaKafkaStreamSuite.testKafkaStream ## What changes were proposed in this pull request? `JavaKafkaStreamSuite.testKafkaStream` assumes when `sent.size == result.size`, the contents of `sent` and `result` should be same. However, that's not true. The content of `result` may not be the final content. This PR modified the test to always retry the assertions even if the contents of `sent` and `result` are not same. Here is the failure in Jenkins: http://spark-tests.appspot.com/tests/org.apache.spark.streaming.kafka.JavaKafkaStreamSuite/testKafkaStream ## How was this patch tested? Jenkins unit tests. Author: Shixiong Zhu <shixiong@databricks.com> Closes #13281 from zsxwing/flaky-kafka-test.
*	[SPARK-15290][BUILD] Move annotations, like @Since / @DeveloperApi, into ↵	Sean Owen	2016-05-17	6	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	spark-tags ## What changes were proposed in this pull request? (See https://github.com/apache/spark/pull/12416 where most of this was already reviewed and committed; this is just the module structure and move part. This change does not move the annotations into test scope, which was the apparently problem last time.) Rename `spark-test-tags` -> `spark-tags`; move common annotations like `Since` to `spark-tags` ## How was this patch tested? Jenkins tests. Author: Sean Owen <sowen@cloudera.com> Closes #13074 from srowen/SPARK-15290.
*	[SPARK-12972][CORE] Update org.apache.httpcomponents.httpclient	Sean Owen	2016-05-15	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? (Retry of https://github.com/apache/spark/pull/13049) - update to httpclient 4.5 / httpcore 4.4 - remove some defunct exclusions - manage httpmime version to match - update selenium / httpunit to support 4.5 (possible now that Jetty 9 is used) ## How was this patch tested? Jenkins tests. Also, locally running the same test command of one Jenkins profile that failed: `mvn -Phadoop-2.6 -Pyarn -Phive -Phive-thriftserver -Pkinesis-asl ...` Author: Sean Owen <sowen@cloudera.com> Closes #13117 from srowen/SPARK-12972.2.
*	Revert "[SPARK-12972][CORE] Update org.apache.httpcomponents.httpclient"	Sean Owen	2016-05-13	1	-0/+2
\| \| \| \|	This reverts commit c74a6c3f2363f065a4915fdadec5eff665fa02e7.
*	[SPARK-12972][CORE] Update org.apache.httpcomponents.httpclient	Sean Owen	2016-05-13	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? - update httpcore/httpclient to latest - centralize version management - remove excludes that are no longer relevant according to SBT/Maven dep graphs - also manage httpmime to match httpclient ## How was this patch tested? Jenkins tests, plus review of dependency graphs from SBT/Maven, and review of test-dependencies.sh output Author: Sean Owen <sowen@cloudera.com> Closes #13049 from srowen/SPARK-12972.
*	[SPARK-14421] Upgrades protobuf dependency to 2.6.1 for the new version of ↵	Brian O'Neill	2016-05-12	1	-1/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	KCL, and… ## What changes were proposed in this pull request? When running with Kinesis Consumer Library (KCL), against a stream that contains aggregated data, the KCL needs access to protobuf to de-aggregate the records. Without this patch, that results in the following error message: ``` Caused by: java.lang.ClassNotFoundException: com.google.protobuf.ProtocolStringList ``` This PR upgrades the protobuf dependency within the kinesis-asl-assembly, and relocates that package (as not to conflict with Spark's use of 2.5.0), which fixes the above CNFE. ## How was this patch tested? Used kinesis word count example against a stream containing aggregated data. See: SPARK-14421 Author: Brian O'Neill <bone@alumni.brown.edu> Closes #13054 from boneill42/protobuf-relocation-for-kcl.
*	[SPARK-15085][STREAMING][KAFKA] Rename streaming-kafka artifact	cody koeninger	2016-05-11	23	-6/+6
\| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Renaming the streaming-kafka artifact to include kafka version, in anticipation of needing a different artifact for later kafka versions ## How was this patch tested? Unit tests Author: cody koeninger <cody@koeninger.org> Closes #12946 from koeninger/SPARK-15085.
*	[SPARK-14936][BUILD][TESTS] FlumePollingStreamSuite is slow	Xin Ren	2016-05-10	2	-8/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	https://issues.apache.org/jira/browse/SPARK-14936 ## What changes were proposed in this pull request? FlumePollingStreamSuite contains two tests which run for a minute each. This seems excessively slow and we should speed it up if possible. In this PR, instead of creating `StreamingContext` directly from `conf`, here an underlying `SparkContext` is created before all and it is used to create each`StreamingContext`. Running time is reduced by avoiding multiple `SparkContext` creations and destroys. ## How was this patch tested? Tested on my local machine running `testOnly *.FlumePollingStreamSuite` Author: Xin Ren <iamshrek@126.com> Closes #12845 from keypointt/SPARK-14936.
*	[SPARK-6005][TESTS] Fix flaky test: ↵	Shixiong Zhu	2016-05-10	1	-6/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	o.a.s.streaming.kafka.DirectKafkaStreamSuite.offset recovery ## What changes were proposed in this pull request? Because this test extracts data from `DStream.generatedRDDs` before stopping, it may get data before checkpointing. Then after recovering from the checkpoint, `recoveredOffsetRanges` may contain something not in `offsetRangesBeforeStop`, which will fail the test. Adding `Thread.sleep(1000)` before `ssc.stop()` will reproduce this failure. This PR just moves the logic of `offsetRangesBeforeStop` (also renamed to `offsetRangesAfterStop`) after `ssc.stop()` to fix the flaky test. ## How was this patch tested? Jenkins unit tests. Author: Shixiong Zhu <shixiong@databricks.com> Closes #12903 from zsxwing/SPARK-6005.
*	[SPARK-14642][SQL] import org.apache.spark.sql.expressions._ breaks udf ↵	Subhobrata Dey	2016-05-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	under functions ## What changes were proposed in this pull request? PR fixes the import issue which breaks udf functions. The following code snippet throws an error ``` scala> import org.apache.spark.sql.functions._ import org.apache.spark.sql.functions._ scala> import org.apache.spark.sql.expressions._ import org.apache.spark.sql.expressions._ scala> udf((v: String) => v.stripSuffix("-abc")) <console>:30: error: No TypeTag available for String udf((v: String) => v.stripSuffix("-abc")) ``` This PR resolves the issue. ## How was this patch tested? patch tested with unit tests. (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Author: Subhobrata Dey <sbcd90@gmail.com> Closes #12458 from sbcd90/udfFuncBreak.
*	[SPARK-14738][BUILD] Separate docker integration tests from main build	Luciano Resende	2016-05-06	3	-10/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Create a maven profile for executing the docker integration tests using maven Remove docker integration tests from main sbt build Update documentation on how to run docker integration tests from sbt ## How was this patch tested? Manual test of the docker integration tests as in : mvn -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.11 compile test ## Other comments Note that the the DB2 Docker Tests are still disabled as there is a kernel version issue on the AMPLab Jenkins slaves and we would need to get them on the right level before enabling those tests. They do run ok locally with the updates from PR #12348 Author: Luciano Resende <lresende@apache.org> Closes #12508 from lresende/docker.
*	[SPARK-14589][SQL] Enhance DB2 JDBC Dialect docker tests	Luciano Resende	2016-05-05	1	-56/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Enhance the DB2 JDBC Dialect docker tests as they seemed to have had some issues on previous merge causing some tests to fail. ## How was this patch tested? By running the integration tests locally. Author: Luciano Resende <lresende@apache.org> Closes #12348 from lresende/SPARK-14589.
*	[SPARK-12154] Upgrade to Jersey 2	mcheah	2016-05-05	2	-14/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Replace com.sun.jersey with org.glassfish.jersey. Changes to the Spark Web UI code were required to compile. The changes were relatively standard Jersey migration things. ## How was this patch tested? I did a manual test for the standalone web APIs. Although I didn't test the functionality of the security filter itself, the code that changed non-trivially is how we actually register the filter. I attached a debugger to the Spark master and verified that the SecurityFilter code is indeed invoked upon hitting /api/v1/applications. Author: mcheah <mcheah@palantir.com> Closes #12715 from mccheah/feature/upgrade-jersey.
*	Revert "[SPARK-14613][ML] Add @Since into the matrix and vector classes in ↵	Yin Huai	2016-04-28	6	-7/+6
\| \| \| \| \| \|	spark-mllib-local" This reverts commit dae538a4d7c36191c1feb02ba87ffc624ab960dc.
*	[SPARK-14613][ML] Add @Since into the matrix and vector classes in ↵	Pravin Gadakh	2016-04-28	6	-6/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	spark-mllib-local ## What changes were proposed in this pull request? This PR adds `since` tag into the matrix and vector classes in spark-mllib-local. ## How was this patch tested? Scala-style checks passed. Author: Pravin Gadakh <prgadakh@in.ibm.com> Closes #12416 from pravingadakh/SPARK-14613.
*	[SPARK-14868][BUILD] Enable NewLineAtEofChecker in checkstyle and fix ↵	Dongjoon Hyun	2016-04-24	3	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	lint-java errors ## What changes were proposed in this pull request? Spark uses `NewLineAtEofChecker` rule in Scala by ScalaStyle. And, most Java code also comply with the rule. This PR aims to enforce the same rule `NewlineAtEndOfFile` by CheckStyle explicitly. Also, this fixes lint-java errors since SPARK-14465. The followings are the items. - Adds a new line at the end of the files (19 files) - Fixes 25 lint-java errors (12 RedundantModifier, 6 ArrayTypeStyle, 2 LineLength, 2 UnusedImports, 2 RegexpSingleline, 1 ModifierOrder) ## How was this patch tested? After the Jenkins test succeeds, `dev/lint-java` should pass. (Currently, Jenkins dose not run lint-java.) ```bash $ dev/lint-java Using `mvn` from path: /usr/local/bin/mvn Checkstyle checks passed. ``` Author: Dongjoon Hyun <dongjoon@apache.org> Closes #12632 from dongjoon-hyun/SPARK-14868.
*	[SPARK-8393][STREAMING] JavaStreamingContext#awaitTermination() throws ↵	Sean Owen	2016-04-21	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	non-declared InterruptedException ## What changes were proposed in this pull request? `JavaStreamingContext.awaitTermination` methods should be declared as `throws[InterruptedException]` so that this exception can be handled in Java code. Note this is not just a doc change, but an API change, since now (in Java) the method has a checked exception to handle. All await-like methods in Java APIs behave this way, so seems worthwhile for 2.0. ## How was this patch tested? Jenkins tests Author: Sean Owen <sowen@cloudera.com> Closes #12418 from srowen/SPARK-8393.
*	[HOTFIX] Ignore all Docker integration tests	Josh Rosen	2016-04-20	3	-0/+9
\| \| \| \| \| \| \| \|	The Docker integration tests are failing very often (https://spark-tests.appspot.com/failed-tests) so I think we should disable these suites for now until we have time to improve them. Author: Josh Rosen <joshrosen@databricks.com> Closes #12549 from JoshRosen/ignore-all-docker-tests.
*	[SPARK-14504][SQL] Enable Oracle docker tests	Luciano Resende	2016-04-18	2	-10/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Enable Oracle docker tests ## How was this patch tested? Existing tests Author: Luciano Resende <lresende@apache.org> Closes #12270 from lresende/oracle.
*	[MINOR][SQL] Remove extra anonymous closure within functional transformations	hyukjinkwon	2016-04-14	5	-18/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This PR removes extra anonymous closure within functional transformations. For example, ```scala .map(item => { ... }) ``` which can be just simply as below: ```scala .map { item => ... } ``` ## How was this patch tested? Related unit tests and `sbt scalastyle`. Author: hyukjinkwon <gurwls223@gmail.com> Closes #12382 from HyukjinKwon/minor-extra-closers.
*	[SPARK-14508][BUILD] Add a new ScalaStyle Rule `OmitBracesInCase`	Dongjoon Hyun	2016-04-12	1	-15/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? According to the [Spark Code Style Guide](https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide) and [Scala Style Guide](http://docs.scala-lang.org/style/control-structures.html#curlybraces), we had better enforce the following rule. ``` case: Always omit braces in case clauses. ``` This PR makes a new ScalaStyle rule, 'OmitBracesInCase', and enforces it to the code. ## How was this patch tested? Pass the Jenkins tests (including Scala style checking) Author: Dongjoon Hyun <dongjoon@apache.org> Closes #12280 from dongjoon-hyun/SPARK-14508.
*	[SPARK-10521][SQL] Utilize Docker for test DB2 JDBC Dialect support	Luciano Resende	2016-04-11	6	-2/+211
\| \| \| \| \| \| \| \|	Add integration tests based on docker to test DB2 JDBC dialect support Author: Luciano Resende <lresende@apache.org> Closes #9893 from lresende/SPARK-10521.
*	[SPARK-14465][BUILD] Checkstyle should check all Java files	Dongjoon Hyun	2016-04-09	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Currently, `checkstyle` is configured to check the files under `src/main/java`. However, Spark has Java files in `src/main/scala`, too. This PR fixes the following configuration in `pom.xml` and the unchecked-so-far violations on those files. ```xml -<sourceDirectory>${basedir}/src/main/java</sourceDirectory> +<sourceDirectories>${basedir}/src/main/java,${basedir}/src/main/scala</sourceDirectories> ``` ## How was this patch tested? After passing the Jenkins build and manually `dev/lint-java`. (Note that Jenkins does not run `lint-java`) Author: Dongjoon Hyun <dongjoon@apache.org> Closes #12242 from dongjoon-hyun/SPARK-14465.
*	[SPARK-14134][CORE] Change the package name used for shading classes.	Marcelo Vanzin	2016-04-06	6	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current package name uses a dash, which is a little weird but seemed to work. That is, until a new test tried to mock a class that references one of those shaded types, and then things started failing. Most changes are just noise to fix the logging configs. For reference, SPARK-8815 also raised this issue, although at the time it did not cause any issues in Spark, so it was not addressed. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #11941 from vanzin/SPARK-14134.
*	[SPARK-14444][BUILD] Add a new scalastyle `NoScalaDoc` to prevent ↵	Dongjoon Hyun	2016-04-06	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ScalaDoc-style multiline comments ## What changes were proposed in this pull request? According to the [Spark Code Style Guide](https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide#SparkCodeStyleGuide-Indentation), this PR adds a new scalastyle rule to prevent the followings. ``` /** In Spark, we don't use the ScalaDoc style so this * is not correct. */ ``` ## How was this patch tested? Pass the Jenkins tests (including `lint-scala`). Author: Dongjoon Hyun <dongjoon@apache.org> Closes #12221 from dongjoon-hyun/SPARK-14444.
*	[SPARK-14359] Unit tests for java 8 lambda syntax with typed aggregates	Eric Liang	2016-04-05	2	-0/+73
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? Adds unit tests for java 8 lambda syntax with typed aggregates as a follow-up to #12168 ## How was this patch tested? Unit tests. Author: Eric Liang <ekl@databricks.com> Closes #12181 from ericl/sc-2794-2.
*	[SPARK-14355][BUILD] Fix typos in Exception/Testcase/Comments and static ↵	Dongjoon Hyun	2016-04-03	1	-5/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	analysis results ## What changes were proposed in this pull request? This PR contains the following 5 types of maintenance fix over 59 files (+94 lines, -93 lines). - Fix typos(exception/log strings, testcase name, comments) in 44 lines. - Fix lint-java errors (MaxLineLength) in 6 lines. (New codes after SPARK-14011) - Use diamond operators in 40 lines. (New codes after SPARK-13702) - Fix redundant semicolon in 5 lines. - Rename class `InferSchemaSuite` to `CSVInferSchemaSuite` in CSVInferSchemaSuite.scala. ## How was this patch tested? Manual and pass the Jenkins tests. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #12139 from dongjoon-hyun/SPARK-14355.
*	[MINOR][DOCS] Use multi-line JavaDoc comments in Scala code.	Dongjoon Hyun	2016-04-02	2	-14/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	## What changes were proposed in this pull request? This PR aims to fix all Scala-Style multiline comments into Java-Style multiline comments in Scala codes. (All comment-only changes over 77 files: +786 lines, −747 lines) ## How was this patch tested? Manual. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #12130 from dongjoon-hyun/use_multiine_javadoc_comments.
*	[SPARK-14281][TESTS] Fix java8-tests and simplify their build	Josh Rosen	2016-03-31	5	-83/+29
\| \| \| \| \| \| \| \|	This patch fixes a compilation / build break in Spark's `java8-tests` and refactors their POM to simplify the build. See individual commit messages for more details. Author: Josh Rosen <joshrosen@databricks.com> Closes #12073 from JoshRosen/fix-java8-tests.