aboutsummaryrefslogtreecommitdiff
path: root/pom.xml
Commit message (Collapse)AuthorAgeFilesLines
* [SPARK-8937] [TEST] A setting `spark.unsafe.exceptionOnMemoryLeak ` is ↵Kousuke Saruta2015-07-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | missing in ScalaTest config. `spark.unsafe.exceptionOnMemoryLeak` is present in the config of surefire. ``` <!-- Surefire runs all Java tests --> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> <version>2.18.1</version> <!-- Note config is repeated in scalatest config --> ... <spark.unsafe.exceptionOnMemoryLeak>true</spark.unsafe.exceptionOnMemoryLeak> </systemProperties> ... ``` but is absent in the config ScalaTest. Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp> Closes #7308 from sarutak/add-setting-for-memory-leak and squashes the following commits: 95644e7 [Kousuke Saruta] Added a setting for memory leak
* [SPARK-6123] [SPARK-6775] [SPARK-6776] [SQL] Refactors Parquet read path for ↵Cheng Lian2015-07-081-0/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | interoperability and backwards-compatibility This PR is a follow-up of #6617 and is part of [SPARK-6774] [2], which aims to ensure interoperability and backwards-compatibility for Spark SQL Parquet support. And this one fixes the read path. Now Spark SQL is expected to be able to read legacy Parquet data files generated by most (if not all) common libraries/tools like parquet-thrift, parquet-avro, and parquet-hive. However, we still need to refactor the write path to write standard Parquet LISTs and MAPs ([SPARK-8848] [4]). ### Major changes 1. `CatalystConverter` class hierarchy refactoring - Replaces `CatalystConverter` trait with a much simpler `ParentContainerUpdater`. Now instead of extending the original `CatalystConverter` trait, every converter class accepts an updater which is responsible for propagating the converted value to some parent container. For example, appending array elements to a parent array buffer, appending a key-value pairs to a parent mutable map, or setting a converted value to some specific field of a parent row. Root converter doesn't have a parent and thus uses a `NoopUpdater`. This simplifies the design since converters don't need to care about details of their parent converters anymore. - Unifies `CatalystRootConverter`, `CatalystGroupConverter` and `CatalystPrimitiveRowConverter` into `CatalystRowConverter` Specifically, now all row objects are represented by `SpecificMutableRow` during conversion. - Refactors `CatalystArrayConverter`, and removes `CatalystArrayContainsNullConverter` and `CatalystNativeArrayConverter` `CatalystNativeArrayConverter` was probably designed with the intention of avoiding boxing costs. However, the way it uses Scala generics actually doesn't achieve this goal. The new `CatalystArrayConverter` handles both nullable and non-nullable array elements in a consistent way. - Implements backwards-compatibility rules in `CatalystArrayConverter` When Parquet records are being converted, schema of Parquet files should have already been verified. So we only need to care about the structure rather than field names in the Parquet schema. Since all map objects represented in legacy systems have the same structure as the standard one (see [backwards-compatibility rules for MAP] [1]), we only need to deal with LIST (namely array) in `CatalystArrayConverter`. 2. Requested columns handling When specifying requested columns in `RowReadSupport`, we used to use a Parquet `MessageType` converted from a Catalyst `StructType` which contains all requested columns. This is not preferable when taking compatibility and interoperability into consideration. Because the actual Parquet file may have different physical structure from the converted schema. In this PR, the schema for requested columns is constructed using the following method: - For a column that exists in the target Parquet file, we extract the column type by name from the full file schema, and construct a single-field `MessageType` for that column. - For a column that doesn't exist in the target Parquet file, we create a single-field `StructType` and convert it to a `MessageType` using `CatalystSchemaConverter`. - Unions all single-field `MessageType`s into a full schema containing all requested fields With this change, we also fix [SPARK-6123] [3] by validating the global schema against each individual Parquet part-files. ### Testing This PR also adds compatibility tests for parquet-avro, parquet-thrift, and parquet-hive. Please refer to `README.md` under `sql/core/src/test` for more information about these tests. To avoid build time code generation and adding extra complexity to the build system, Java code generated from testing Thrift schema and Avro IDL is also checked in. [1]: https://github.com/apache/incubator-parquet-format/blob/master/LogicalTypes.md#backward-compatibility-rules-1 [2]: https://issues.apache.org/jira/browse/SPARK-6774 [3]: https://issues.apache.org/jira/browse/SPARK-6123 [4]: https://issues.apache.org/jira/browse/SPARK-8848 Author: Cheng Lian <lian@databricks.com> Closes #7231 from liancheng/spark-6776 and squashes the following commits: 360fe18 [Cheng Lian] Adds ParquetHiveCompatibilitySuite c6fbc06 [Cheng Lian] Removes WIP file committed by mistake b8c1295 [Cheng Lian] Excludes the whole parquet package from MiMa 598c3e8 [Cheng Lian] Adds extra Maven repo for hadoop-lzo, which is a transitive dependency of parquet-thrift 926af87 [Cheng Lian] Simplifies Parquet compatibility test suites 7946ee1 [Cheng Lian] Fixes Scala styling issues 3d7ab36 [Cheng Lian] Fixes .rat-excludes a8f13bb [Cheng Lian] Using Parquet writer API to do compatibility tests f2208cd [Cheng Lian] Adds README.md for Thrift/Avro code generation 1d390aa [Cheng Lian] Adds parquet-thrift compatibility test 440f7b3 [Cheng Lian] Adds generated files to .rat-excludes 13b9121 [Cheng Lian] Adds ParquetAvroCompatibilitySuite 06cfe9d [Cheng Lian] Adds comments about TimestampType handling a099d3e [Cheng Lian] More comments 0cc1b37 [Cheng Lian] Fixes MiMa checks 884d3e6 [Cheng Lian] Fixes styling issue and reverts unnecessary changes 802cbd7 [Cheng Lian] Fixes bugs related to schema merging and empty requested columns 38fe1e7 [Cheng Lian] Adds explicit return type 7fb21f1 [Cheng Lian] Reverts an unnecessary debugging change 1781dff [Cheng Lian] Adds test case for SPARK-8811 6437d4b [Cheng Lian] Assembles requested schema from Parquet file schema bcac49f [Cheng Lian] Removes the 16-byte restriction of decimals a74fb2c [Cheng Lian] More comments 0525346 [Cheng Lian] Removes old Parquet record converters 03c3bd9 [Cheng Lian] Refactors Parquet read path to implement backwards-compatibility rules
* [SPARK-6731] [CORE] Addendum: Upgrade Apache commons-math3 to 3.4.1Sean Owen2015-07-071-3/+0
| | | | | | | | | | | | (This finishes the job by removing the version overridden by Hadoop profiles.) See discussion at https://github.com/apache/spark/pull/6994#issuecomment-119113167 Author: Sean Owen <sowen@cloudera.com> Closes #7261 from srowen/SPARK-6731.2 and squashes the following commits: 5a3f59e [Sean Owen] Finish updating Commons Math3 to 3.4.1 from 3.1.1
* [HOTFIX] Rename release-profile to releasePatrick Wendell2015-07-061-1/+1
| | | | | | | | | | | when publishing releases. We named it as 'release-profile' because that is the Maven convention. However, it turns out this special name causes several other things to kick-in when we are creating releases that are not desirable. For instance, it triggers the javadoc plugin to run, which actually fails in our current build set-up. The fix is just to rename this to a different profile to have no collateral damage associated with its use.
* [SPARK-8819] Fix build for maven 3.3.xAndrew Or2015-07-061-0/+24
| | | | | | | | | | | | | | | | | | | | This is a workaround for MSHADE-148, which leads to an infinite loop when building Spark with maven 3.3.x. This was originally caused by #6441, which added a bunch of test dependencies on the spark-core test module. Recently, it was revealed by #7193. This patch adds a `-Prelease` profile. If present, it will set `createDependencyReducedPom` to true. The consequences are: - If you are releasing Spark with this profile, you are fine as long as you use maven 3.2.x or before. - If you are releasing Spark without this profile, you will run into SPARK-8781. - If you are not releasing Spark but you are using this profile, you may run into SPARK-8819. - If you are not releasing Spark and you did not include this profile, you are fine. This is all documented in `pom.xml` and tested locally with both versions of maven. Author: Andrew Or <andrew@databricks.com> Closes #7219 from andrewor14/fix-maven-build and squashes the following commits: 1d37e87 [Andrew Or] Merge branch 'master' of github.com:apache/spark into fix-maven-build 3574ae4 [Andrew Or] Review comments f39199c [Andrew Or] Create a -Prelease profile that flags `createDependencyReducedPom`
* [SPARK-8781] Fix variables in published pom.xml are not resolvedAndrew Or2015-07-021-2/+0
| | | | | | | | | | | | | The issue is summarized in the JIRA and is caused by this commit: 984ad60147c933f2d5a2040c87ae687c14eb1724. This patch reverts that commit and fixes the maven build in a different way. We limit the dependencies of `KinesisReceiverSuite` to avoid having to deal with the complexities in how maven deals with transitive test dependencies. Author: Andrew Or <andrew@databricks.com> Closes #7193 from andrewor14/fix-kinesis-pom and squashes the following commits: ca3d5d4 [Andrew Or] Limit kinesis test dependencies f24e09c [Andrew Or] Revert "[BUILD] Fix Maven build for Kinesis"
* [SPARK-8378] [STREAMING] Add the Python API for Flumezsxwing2015-07-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | | Author: zsxwing <zsxwing@gmail.com> Closes #6830 from zsxwing/flume-python and squashes the following commits: 78dfdac [zsxwing] Fix the compile error in the test code f1bf3c0 [zsxwing] Address TD's comments 0449723 [zsxwing] Add sbt goal streaming-flume-assembly/assembly e93736b [zsxwing] Fix the test case for determine_modules_to_test 9d5821e [zsxwing] Fix pyspark_core dependencies f9ee681 [zsxwing] Merge branch 'master' into flume-python 7a55837 [zsxwing] Add streaming_flume_assembly to run-tests.py b96b0de [zsxwing] Merge branch 'master' into flume-python ce85e83 [zsxwing] Fix incompatible issues for Python 3 01cbb3d [zsxwing] Add import sys 152364c [zsxwing] Fix the issue that StringIO doesn't work in Python 3 14ba0ff [zsxwing] Add flume-assembly for sbt building b8d5551 [zsxwing] Merge branch 'master' into flume-python 4762c34 [zsxwing] Fix the doc 0336579 [zsxwing] Refactor Flume unit tests and also add tests for Python API 9f33873 [zsxwing] Add the Python API for Flume
* [SPARK-8709] Exclude hadoop-client's mockito-all dependencyJosh Rosen2015-06-291-0/+8
| | | | | | | | | | This patch excludes `hadoop-client`'s dependency on `mockito-all`. As of #7061, Spark depends on `mockito-core` instead of `mockito-all`, so the dependency from Hadoop was leading to test compilation failures for some of the Hadoop 2 SBT builds. Author: Josh Rosen <joshrosen@databricks.com> Closes #7090 from JoshRosen/SPARK-8709 and squashes the following commits: e190122 [Josh Rosen] [SPARK-8709] Exclude hadoop-client's mockito-all dependency.
* [SPARK-7845] [BUILD] Bumping default Hadoop version used in profile hadoop-1 ↵Cheng Lian2015-06-281-1/+1
| | | | | | | | | | | | to 1.2.1 PR #5694 reverted PR #6384 while refactoring `dev/run-tests` to `dev/run-tests.py`. Also, PR #6384 didn't bump Hadoop 1 version defined in POM. Author: Cheng Lian <lian@databricks.com> Closes #7062 from liancheng/spark-7845 and squashes the following commits: c088b72 [Cheng Lian] Bumping default Hadoop version used in profile hadoop-1 to 1.2.1
* [SPARK-8649] [BUILD] Mapr repository is not defined properlyThomas Szymanski2015-06-281-1/+1
| | | | | | | | | | | | | | | | The previous commiter on this part was pwendell The previous url gives 404, the new one seems to be OK. This patch is added under the Apache License 2.0. The JIRA link: https://issues.apache.org/jira/browse/SPARK-8649 Author: Thomas Szymanski <develop@tszymanski.com> Closes #7054 from tszym/SPARK-8649 and squashes the following commits: bfda9c4 [Thomas Szymanski] [SPARK-8649] [BUILD] Mapr repository is not defined properly
* [SPARK-8683] [BUILD] Depend on mockito-core instead of mockito-allJosh Rosen2015-06-271-1/+1
| | | | | | | | | | | | Spark's tests currently depend on `mockito-all`, which bundles Hamcrest and Objenesis classes. Instead, it should depend on `mockito-core`, which declares those libraries as Maven dependencies. This is necessary in order to fix a dependency conflict that leads to a NoSuchMethodError when using certain Hamcrest matchers. See https://github.com/mockito/mockito/wiki/Declaring-mockito-dependency for more details. Author: Josh Rosen <joshrosen@databricks.com> Closes #7061 from JoshRosen/mockito-core-instead-of-all and squashes the following commits: 70eccbe [Josh Rosen] Depend on mockito-core instead of mockito-all.
* [SPARK-8307] [SQL] improve timestamp from parquetDavies Liu2015-06-221-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | This PR change to convert julian day to unix timestamp directly (without Calendar and Timestamp). cc adrian-wang rxin Author: Davies Liu <davies@databricks.com> Closes #6759 from davies/improve_ts and squashes the following commits: 849e301 [Davies Liu] Merge branch 'master' of github.com:apache/spark into improve_ts b0e4cad [Davies Liu] Merge branch 'master' of github.com:apache/spark into improve_ts 8e2d56f [Davies Liu] address comments 634b9f5 [Davies Liu] fix mima 4891efb [Davies Liu] address comment bfc437c [Davies Liu] fix build ae5979c [Davies Liu] Merge branch 'master' of github.com:apache/spark into improve_ts 602b969 [Davies Liu] remove jodd 2f2e48c [Davies Liu] fix test 8ace611 [Davies Liu] fix mima 212143b [Davies Liu] fix mina c834108 [Davies Liu] Merge branch 'master' of github.com:apache/spark into improve_ts a3171b8 [Davies Liu] Merge branch 'master' of github.com:apache/spark into improve_ts 5233974 [Davies Liu] fix scala style 361fd62 [Davies Liu] address comments ea196d4 [Davies Liu] improve timestamp from parquet
* [SPARK-8289] Specify stack size for consistency with Java tests - resolves ↵Adam Roberts2015-06-111-1/+1
| | | | | | | | | | | | | | | | | | test failures This change is a simple one and specifies a stack size of 4096k instead of the vendor default for Java tests (the defaults vary between Java vendors). This remedies test failures observed with JavaALSSuite with IBM and Oracle Java owing to a lower default size in comparison to the size with OpenJDK. 4096k is a suitable default where the tests pass with each Java vendor tested. The alternative is to reduce the number of iterations in the test (no observed failures with 5 iterations instead of 15). -Xss works with Oracle's HotSpot VM, IBM's J9 VM and OpenJDK (IcedTea). I have ensured this does not have any negative implications for other tests. Author: Adam Roberts <aroberts@uk.ibm.com> Author: a-roberts <aroberts@uk.ibm.com> Closes #6727 from a-roberts/IncJavaStackSize and squashes the following commits: ab40aea [Adam Roberts] Specify stack size for SBT builds 5032d8d [a-roberts] Update pom.xml
* [SPARK-8101] [CORE] Upgrade netty to avoid memory leak accord to netty #3837 ↵Sean Owen2015-06-091-1/+1
| | | | | | | | | | | | issues Update to Netty 4.0.28-Final Author: Sean Owen <sowen@cloudera.com> Closes #6701 from srowen/SPARK-8101 and squashes the following commits: f3b6369 [Sean Owen] Update to Netty 4.0.28-Final
* [SPARK-8126] [BUILD] Use custom temp directory during build.Marcelo Vanzin2015-06-081-1/+23
| | | | | | | | | | | | | | | | | | | | | Even with all the efforts to cleanup the temp directories created by unit tests, Spark leaves a lot of garbage in /tmp after a test run. This change overrides java.io.tmpdir to place those files under the build directory instead. After an sbt full unit test run, I was left with > 400 MB of temp files. Since they're now under the build dir, it's much easier to clean them up. Also make a slight change to a unit test to make it not pollute the source directory with test data. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #6674 from vanzin/SPARK-8126 and squashes the following commits: 0f8ad41 [Marcelo Vanzin] Make sure tmp dir exists when tests run. 643e916 [Marcelo Vanzin] [MINOR] [BUILD] Use custom temp directory during build.
* [SPARK-7733] [CORE] [BUILD] Update build, code to use Java 7 for 1.5.0+Sean Owen2015-06-071-1/+1
| | | | | | | | | | Update build to use Java 7, and remove some comments and special-case support for Java 6. Author: Sean Owen <sowen@cloudera.com> Closes #6265 from srowen/SPARK-7733 and squashes the following commits: 59bda4e [Sean Owen] Update build to use Java 7, and remove some comments and special-case support for Java 6
* [SPARK-7042] [BUILD] use the standard akka artifacts with hadoop-2.xKonstantin Shaposhnikov2015-06-071-2/+4
| | | | | | | | | | | | | | | | Both akka 2.3.x and hadoop-2.x use protobuf 2.5 so only hadoop-1 build needs custom 2.3.4-spark akka version that shades protobuf-2.5 This change also updates akka version (for hadoop-2.x profiles only) to the latest 2.3.11 as akka-zeromq_2.11 is not available for akka 2.3.4. This partially fixes SPARK-7042 (for hadoop-2.x builds) Author: Konstantin Shaposhnikov <Konstantin.Shaposhnikov@sc.com> Closes #6492 from kostya-sh/SPARK-7042 and squashes the following commits: dc195b0 [Konstantin Shaposhnikov] [SPARK-7042] [BUILD] use the standard akka artifacts with hadoop-2.x
* Revert "[MINOR] [BUILD] Use custom temp directory during build."Andrew Or2015-06-051-3/+1
| | | | This reverts commit b16b5434ff44c42e4b3a337f9af147669ba44896.
* [MINOR] [BUILD] Use custom temp directory during build.Marcelo Vanzin2015-06-051-1/+3
| | | | | | | | | | | | | | | | | | | | | Even with all the efforts to cleanup the temp directories created by unit tests, Spark leaves a lot of garbage in /tmp after a test run. This change overrides java.io.tmpdir to place those files under the build directory instead. After an sbt full unit test run, I was left with > 400 MB of temp files. Since they're now under the build dir, it's much easier to clean them up. Also make a slight change to a unit test to make it not pollute the source directory with test data. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #6653 from vanzin/unit-test-tmp and squashes the following commits: 31e2dd5 [Marcelo Vanzin] Fix tests that depend on each other. aa92944 [Marcelo Vanzin] [minor] [build] Use custom temp directory during build.
* [SPARK-8106] [SQL] Set derby.system.durability=test to speed up Hive ↵Josh Rosen2015-06-041-0/+2
| | | | | | | | | | | | | | | | compatibility tests Derby has a `derby.system.durability` configuration property that can be used to disable I/O synchronization calls for writes. This sacrifices durability but can result in large performance gains, which is appropriate for tests. We should enable this in our test system properties in order to speed up the Hive compatibility tests. I saw 2-3x speedups locally with this change. See https://db.apache.org/derby/docs/10.8/ref/rrefproperdurability.html for more documentation of this property. Author: Josh Rosen <joshrosen@databricks.com> Closes #6651 from JoshRosen/hive-compat-suite-speedup and squashes the following commits: b7a08a2 [Josh Rosen] Set derby.system.durability=test in our unit tests.
* [SPARK-7743] [SQL] Parquet 1.7Thomas Omans2015-06-041-3/+3
| | | | | | | | | | | | | | | | | | | | | | Resolves [SPARK-7743](https://issues.apache.org/jira/browse/SPARK-7743). Trivial changes of versions, package names, as well as a small issue in `ParquetTableOperations.scala` ```diff - val readContext = getReadSupport(configuration).init( + val readContext = ParquetInputFormat.getReadSupportInstance(configuration).init( ``` Since ParquetInputFormat.getReadSupport was made package private in the latest release. Thanks -- Thomas Omans Author: Thomas Omans <tomans@cj.com> Closes #6597 from eggsby/SPARK-7743 and squashes the following commits: 2df0d1b [Thomas Omans] [SPARK-7743] [SQL] Upgrading parquet version to 1.7.0
* [SPARK-7956] [SQL] Use Janino to compile SQL expressions into bytecodeDavies Liu2015-06-041-10/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to reduce the overhead of codegen, this PR switch to use Janino to compile SQL expressions into bytecode. After this, the time used to compile a SQL expression is decreased from 100ms to 5ms, which is necessary to turn on codegen for general workload, also tests. cc rxin Author: Davies Liu <davies@databricks.com> Closes #6479 from davies/janino and squashes the following commits: cc689f5 [Davies Liu] remove globalLock 262d848 [Davies Liu] Merge branch 'master' of github.com:apache/spark into janino eec3a33 [Davies Liu] address comments from Josh f37c8c3 [Davies Liu] fix DecimalType and cast to String 202298b [Davies Liu] Merge branch 'master' of github.com:apache/spark into janino a21e968 [Davies Liu] fix style 0ed3dc6 [Davies Liu] Merge branch 'master' of github.com:apache/spark into janino 551a851 [Davies Liu] fix tests c3bdffa [Davies Liu] remove print 6089ce5 [Davies Liu] change logging level 7e46ac3 [Davies Liu] fix style d8f0f6c [Davies Liu] Merge branch 'master' of github.com:apache/spark into janino da4926a [Davies Liu] fix tests 03660f3 [Davies Liu] WIP: use Janino to compile Java source f2629cd [Davies Liu] Merge branch 'master' of github.com:apache/spark into janino f7d66cf [Davies Liu] use template based string for codegen
* [BUILD] Fix Maven build for KinesisAndrew Or2015-06-031-0/+2
| | | | | | A necessary dependency that is transitively referenced is not provided, causing compilation failures in builds that provide the kinesis-asl profile.
* [SPARK-7801] [BUILD] Updating versions to SPARK 1.5.0Patrick Wendell2015-06-031-1/+13
| | | | | | | | | | | | | Author: Patrick Wendell <patrick@databricks.com> Closes #6328 from pwendell/spark-1.5-update and squashes the following commits: 2f42d02 [Patrick Wendell] A few more excludes 4bebcf0 [Patrick Wendell] Update to RC4 61aaf46 [Patrick Wendell] Using new release candidate 55f1610 [Patrick Wendell] Another exclude 04b4f04 [Patrick Wendell] More issues with transient 1.4 changes 36f549b [Patrick Wendell] [SPARK-7801] [BUILD] Updating versions to SPARK 1.5.0
* [SPARK-7850][BUILD] Hive 0.12.0 profile in POM should be removedCheolsoo Park2015-05-271-16/+0
| | | | | | | | | | | | I grep'ed hive-0.12.0 in the source code and removed all the profiles and doc references. Author: Cheolsoo Park <cheolsoop@netflix.com> Closes #6393 from piaozhexiu/SPARK-7850 and squashes the following commits: fb429ce [Cheolsoo Park] Remove hive-0.13.1 profile 82bf09a [Cheolsoo Park] Remove hive 0.12.0 shim code f3722da [Cheolsoo Park] Remove hive-0.12.0 profile and references from POM and build docs
* Revert "[SPARK-7042] [BUILD] use the standard akka artifacts with hadoop-2.x"Patrick Wendell2015-05-261-4/+2
| | | | This reverts commit 43aa819c041f6e8301ad1b8f82eb68e14254f636.
* [SPARK-7042] [BUILD] use the standard akka artifacts with hadoop-2.xKonstantin Shaposhnikov2015-05-261-2/+4
| | | | | | | | | | | | | Both akka 2.3.x and hadoop-2.x use protobuf 2.5 so only hadoop-1 build needs custom 2.3.4-spark akka version that shades protobuf-2.5 This partially fixes SPARK-7042 (for hadoop-2.x builds) Author: Konstantin Shaposhnikov <Konstantin.Shaposhnikov@sc.com> Closes #6341 from kostya-sh/SPARK-7042 and squashes the following commits: 7eb8c60 [Konstantin Shaposhnikov] [SPARK-7042][BUILD] use the standard akka artifacts with hadoop-2.x
* [SPARK-7726] Fix Scaladoc false errorsIulian Dragos2015-05-191-2/+2
| | | | | | | | | | | | | Visibility rules for static members are different in Scala and Java, and this case requires an explicit static import. Even though these are Java files, they are run through scaladoc, which enforces Scala rules. Also reverted the commit that reverts the upgrade to 2.11.6 Author: Iulian Dragos <jaguarul@gmail.com> Closes #6260 from dragos/issue/scaladoc-false-error and squashes the following commits: f2e998e [Iulian Dragos] Revert "[HOTFIX] Revert "[SPARK-7092] Update spark scala version to 2.11.6"" 0bad052 [Iulian Dragos] Fix scaladoc faux-error.
* [HOTFIX] Revert "[SPARK-7092] Update spark scala version to 2.11.6"Patrick Wendell2015-05-191-2/+2
| | | | | | | This reverts commit a11c8683c76c67f45749a1b50a0912a731fd2487. For more information see: https://issues.apache.org/jira/browse/SPARK-7726
* [SPARK-7063] when lz4 compression is used, it causes core dumpJihong MA2015-05-181-1/+1
| | | | | | | | | | | | | | | | | | | | this fix is to solve one issue found in lz4 1.2.0, which caused core dump in Spark Core with IBM JDK. that issue is fixed in lz4 1.3.0 version. Author: Jihong MA <linlin200605@gmail.com> Closes #6226 from JihongMA/SPARK-7063-1 and squashes the following commits: 0cca781 [Jihong MA] SPARK-7063 4559ed5 [Jihong MA] SPARK-7063 daa520f [Jihong MA] SPARK-7063 upgrade lz4 jars 71738ee [Jihong MA] Merge remote-tracking branch 'upstream/master' dfaa971 [Jihong MA] SPARK-7265 minor fix of the content ace454d [Jihong MA] SPARK-7265 take out PySpark on YARN limitation 9ea0832 [Jihong MA] Merge remote-tracking branch 'upstream/master' d5bf3f5 [Jihong MA] Merge remote-tracking branch 'upstream/master' 7b842e6 [Jihong MA] Merge remote-tracking branch 'upstream/master' 9c84695 [Jihong MA] SPARK-7265 address review comment a399aa6 [Jihong MA] SPARK-7265 Improving documentation for Spark SQL Hive support
* [SPARK-6514] [SPARK-5960] [SPARK-6656] [SPARK-7679] [STREAMING] [KINESIS] ↵Tathagata Das2015-05-171-2/+2
| | | | | | | | | | | | | | | | Updates to the Kinesis API SPARK-6514 - Use correct region SPARK-5960 - Allow AWS Credentials to be directly passed SPARK-6656 - Specify kinesis application name explicitly SPARK-7679 - Upgrade to latest KCL and AWS SDK. Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #6147 from tdas/kinesis-api-update and squashes the following commits: f23ea77 [Tathagata Das] Updated versions and updated APIs 373b201 [Tathagata Das] Updated Kinesis API
* [SPARK-7669] Builds against Hadoop 2.6+ get inconsistent curator depend…Steve Loughran2015-05-171-2/+24
| | | | | | | | | | | | | This adds a new profile, `hadoop-2.6`, copying over the hadoop-2.4 properties, updating ZK to 3.4.6 and making the curator version a configurable option. That keeps the curator-recipes JAR in sync with that used in hadoop. There's one more option to consider: making the full curator-client version explicit with its own dependency version. This will pin down the version from hadoop and hive imports Author: Steve Loughran <stevel@hortonworks.com> Closes #6191 from steveloughran/stevel/SPARK-7669-hadoop-2.6 and squashes the following commits: e3e281a [Steve Loughran] SPARK-7669 declare the version of curator-client and curator-framework JARs 2901ea9 [Steve Loughran] SPARK-7669 Builds against Hadoop 2.6+ get inconsistent curator dependencies
* [BUILD] update jblas dependency version to 1.2.4Matthew Brandyberry2015-05-161-1/+1
| | | | | | | | | | jblas 1.2.4 includes native library support for PPC64LE. Author: Matthew Brandyberry <mbrandy@us.ibm.com> Closes #6199 from mtbrandy/jblas-1.2.4 and squashes the following commits: 9df9301 [Matthew Brandyberry] [BUILD] update jblas dependency version to 1.2.4
* [SPARK-7677] [STREAMING] Add Kafka modules to the 2.11 build.Iulian Dragos2015-05-151-4/+2
| | | | | | | | | | | | This is somewhat related to [SPARK-6154](https://issues.apache.org/jira/browse/SPARK-6154), though it only touches Kafka, not the jline dependency for thriftserver. I tested this locally on 2.11 (./run-tests) and everything looked good (I had to disable mima, because `MimaBuild` harcodes 2.10 for the previous version -- that's another PR). Author: Iulian Dragos <jaguarul@gmail.com> Closes #6149 from dragos/issue/spark-2.11-kafka and squashes the following commits: aa15d99 [Iulian Dragos] Add Kafka modules to the 2.11 build.
* [SPARK-7249] Updated Hadoop dependencies due to inconsistency in the versionsFavioVazquez2015-05-141-18/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Updated Hadoop dependencies due to inconsistency in the versions. Now the global properties are the ones used by the hadoop-2.2 profile, and the profile was set to empty but kept for backwards compatibility reasons. Changes proposed by vanzin resulting from previous pull-request https://github.com/apache/spark/pull/5783 that did not fixed the problem correctly. Please let me know if this is the correct way of doing this, the comments of vanzin are in the pull-request mentioned. Author: FavioVazquez <favio.vazquezp@gmail.com> Closes #5786 from FavioVazquez/update-hadoop-dependencies and squashes the following commits: 11670e5 [FavioVazquez] - Added missing instance of -Phadoop-2.2 in create-release.sh 379f50d [FavioVazquez] - Added instances of -Phadoop-2.2 in create-release.sh, run-tests, scalastyle and building-spark.md - Reconstructed docs to not ask users to rely on default behavior 3f9249d [FavioVazquez] Merge branch 'master' of https://github.com/apache/spark into update-hadoop-dependencies 31bdafa [FavioVazquez] - Added missing instances in -Phadoop-1 in create-release.sh, run-tests and in the building-spark documentation cbb93e8 [FavioVazquez] - Added comment related to SPARK-3710 about hadoop-yarn-server-tests in Hadoop 2.2 that fails to pull some needed dependencies 83dc332 [FavioVazquez] - Cleaned up the main POM concerning the yarn profile - Erased hadoop-2.2 profile from yarn/pom.xml and its content was integrated into yarn/pom.xml 93f7624 [FavioVazquez] - Deleted unnecessary comments and <activation> tag on the YARN profile in the main POM 668d126 [FavioVazquez] - Moved <dependencies> <activation> and <properties> sections of the hadoop-2.2 profile in the YARN POM to the YARN profile in the root POM - Erased unnecessary hadoop-2.2 profile from the YARN POM fda6a51 [FavioVazquez] - Updated hadoop1 releases in create-release.sh due to changes in the default hadoop version set - Erased unnecessary instance of -Dyarn.version=2.2.0 in create-release.sh - Prettify comment in yarn/pom.xml 0470587 [FavioVazquez] - Erased unnecessary instance of -Phadoop-2.2 -Dhadoop.version=2.2.0 in create-release.sh - Updated how the releases are made in the create-release.sh no that the default hadoop version is the 2.2.0 - Erased unnecessary instance of -Phadoop-2.2 -Dhadoop.version=2.2.0 in scalastyle - Erased unnecessary instance of -Phadoop-2.2 -Dhadoop.version=2.2.0 in run-tests - Better example given in the hadoop-third-party-distributions.md now that the default hadoop version is 2.2.0 a650779 [FavioVazquez] - Default value of avro.mapred.classifier has been set to hadoop2 in pom.xml - Cleaned up hadoop-2.3 and 2.4 profiles due to change in the default set in avro.mapred.classifier in pom.xml 199f40b [FavioVazquez] - Erased unnecessary CDH5-specific note in docs/building-spark.md - Remove example of instance -Phadoop-2.2 -Dhadoop.version=2.2.0 in docs/building-spark.md - Enabled hadoop-2.2 profile when the Hadoop version is 2.2.0, which is now the default .Added comment in the yarn/pom.xml to specify that. 88a8b88 [FavioVazquez] - Simplified Hadoop profiles due to new setting of global properties in the pom.xml file - Added comment to specify that the hadoop-2.2 profile is now the default hadoop profile in the pom.xml file - Erased hadoop-2.2 from related hadoop profiles now that is a no-op in the make-distribution.sh file 70b8344 [FavioVazquez] - Fixed typo in the make-distribution.sh file and added hadoop-1 in the Related profiles 287fa2f [FavioVazquez] - Updated documentation about specifying the hadoop version in building-spark. Now is clear that Spark will build against Hadoop 2.2.0 by default. - Added Cloudera CDH 5.3.3 without MapReduce example in the building-spark doc. 1354292 [FavioVazquez] - Fixed hadoop-1 version to match jenkins build profile in hadoop1.0 tests and documentation 6b4bfaf [FavioVazquez] - Cleanup in hadoop-2.x profiles since they contained mostly redundant stuff. 7e9955d [FavioVazquez] - Updated Hadoop dependencies due to inconsistency in the versions. Now the global properties are the ones used by the hadoop-2.2 profile, and the profile was set to empty but kept for backwards compatibility reasons 660decc [FavioVazquez] - Updated Hadoop dependencies due to inconsistency in the versions. Now the global properties are the ones used by the hadoop-2.2 profile, and the profile was set to empty but kept for backwards compatibility reasons ec91ce3 [FavioVazquez] - Updated protobuf-java version of com.google.protobuf dependancy to fix blocking error when connecting to HDFS via the Hadoop Cloudera HDFS CDH5 (fix for 2.5.0-cdh5.3.3 version)
* [SPARK-7081] Faster sort-based shuffle path using binary processing ↵Josh Rosen2015-05-131-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cache-aware sort This patch introduces a new shuffle manager that enhances the existing sort-based shuffle with a new cache-friendly sort algorithm that operates directly on binary data. The goals of this patch are to lower memory usage and Java object overheads during shuffle and to speed up sorting. It also lays groundwork for follow-up patches that will enable end-to-end processing of serialized records. The new shuffle manager, `UnsafeShuffleManager`, can be enabled by setting `spark.shuffle.manager=tungsten-sort` in SparkConf. The new shuffle manager uses directly-managed memory to implement several performance optimizations for certain types of shuffles. In cases where the new performance optimizations cannot be applied, the new shuffle manager delegates to SortShuffleManager to handle those shuffles. UnsafeShuffleManager's optimizations will apply when _all_ of the following conditions hold: - The shuffle dependency specifies no aggregation or output ordering. - The shuffle serializer supports relocation of serialized values (this is currently supported by KryoSerializer and Spark SQL's custom serializers). - The shuffle produces fewer than 16777216 output partitions. - No individual record is larger than 128 MB when serialized. In addition, extra spill-merging optimizations are automatically applied when the shuffle compression codec supports concatenation of serialized streams. This is currently supported by Spark's LZF serializer. At a high-level, UnsafeShuffleManager's design is similar to Spark's existing SortShuffleManager. In sort-based shuffle, incoming records are sorted according to their target partition ids, then written to a single map output file. Reducers fetch contiguous regions of this file in order to read their portion of the map output. In cases where the map output data is too large to fit in memory, sorted subsets of the output can are spilled to disk and those on-disk files are merged to produce the final output file. UnsafeShuffleManager optimizes this process in several ways: - Its sort operates on serialized binary data rather than Java objects, which reduces memory consumption and GC overheads. This optimization requires the record serializer to have certain properties to allow serialized records to be re-ordered without requiring deserialization. See SPARK-4550, where this optimization was first proposed and implemented, for more details. - It uses a specialized cache-efficient sorter (UnsafeShuffleExternalSorter) that sorts arrays of compressed record pointers and partition ids. By using only 8 bytes of space per record in the sorting array, this fits more of the array into cache. - The spill merging procedure operates on blocks of serialized records that belong to the same partition and does not need to deserialize records during the merge. - When the spill compression codec supports concatenation of compressed data, the spill merge simply concatenates the serialized and compressed spill partitions to produce the final output partition. This allows efficient data copying methods, like NIO's `transferTo`, to be used and avoids the need to allocate decompression or copying buffers during the merge. The shuffle read path is unchanged. This patch is similar to [SPARK-4550](http://issues.apache.org/jira/browse/SPARK-4550) / #4450 but uses a slightly different implementation. The `unsafe`-based implementation featured in this patch lays the groundwork for followup patches that will enable sorting to operate on serialized data pages that will be prepared by Spark SQL's new `unsafe` operators (such as the new aggregation operator introduced in #5725). ### Future work There are several tasks that build upon this patch, which will be left to future work: - [SPARK-7271](https://issues.apache.org/jira/browse/SPARK-7271) Redesign / extend the shuffle interfaces to accept binary data as input. The goal here is to let us bypass serialization steps in cases where the sort input is produced by an operator that operates directly on binary data. - Extension / redesign of the `Serializer` API. We can add new methods which allow serializers to determine the size requirements for serializing objects and for serializing objects directly to a specified memory address (similar to how `UnsafeRowConverter` works in Spark SQL). <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/5868) <!-- Reviewable:end --> Author: Josh Rosen <joshrosen@databricks.com> Closes #5868 from JoshRosen/unsafe-sort and squashes the following commits: ef0a86e [Josh Rosen] Fix scalastyle errors 7610f2f [Josh Rosen] Add tests for proper cleanup of shuffle data. d494ffe [Josh Rosen] Fix deserialization of JavaSerializer instances. 52a9981 [Josh Rosen] Fix some bugs in the address packing code. 51812a7 [Josh Rosen] Change shuffle manager sort name to tungsten-sort 4023fa4 [Josh Rosen] Add @Private annotation to some Java classes. de40b9d [Josh Rosen] More comments to try to explain metrics code df07699 [Josh Rosen] Attempt to clarify confusing metrics update code 5e189c6 [Josh Rosen] Track time spend closing / flushing files; split TimeTrackingOutputStream into separate file. d5779c6 [Josh Rosen] Merge remote-tracking branch 'origin/master' into unsafe-sort c2ce78e [Josh Rosen] Fix a missed usage of MAX_PARTITION_ID e3b8855 [Josh Rosen] Cleanup in UnsafeShuffleWriter 4a2c785 [Josh Rosen] rename 'sort buffer' to 'pointer array' 6276168 [Josh Rosen] Remove ability to disable spilling in UnsafeShuffleExternalSorter. 57312c9 [Josh Rosen] Clarify fileBufferSize units 2d4e4f4 [Josh Rosen] Address some minor comments in UnsafeShuffleExternalSorter. fdcac08 [Josh Rosen] Guard against overflow when expanding sort buffer. 85da63f [Josh Rosen] Cleanup in UnsafeShuffleSorterIterator. 0ad34da [Josh Rosen] Fix off-by-one in nextInt() call 56781a1 [Josh Rosen] Rename UnsafeShuffleSorter to UnsafeShuffleInMemorySorter e995d1a [Josh Rosen] Introduce MAX_SHUFFLE_OUTPUT_PARTITIONS. e58a6b4 [Josh Rosen] Add more tests for PackedRecordPointer encoding. 4f0b770 [Josh Rosen] Attempt to implement proper shuffle write metrics. d4e6d89 [Josh Rosen] Update to bit shifting constants 69d5899 [Josh Rosen] Remove some unnecessary override vals 8531286 [Josh Rosen] Add tests that automatically trigger spills. 7c953f9 [Josh Rosen] Add test that covers UnsafeShuffleSortDataFormat.swap(). e1855e5 [Josh Rosen] Fix a handful of misc. IntelliJ inspections 39434f9 [Josh Rosen] Avoid integer multiplication overflow in getMemoryUsage (thanks FindBugs!) 1e3ad52 [Josh Rosen] Delete unused ByteBufferOutputStream class. ea4f85f [Josh Rosen] Roll back an unnecessary change in Spillable. ae538dc [Josh Rosen] Document UnsafeShuffleManager. ec6d626 [Josh Rosen] Add notes on maximum # of supported shuffle partitions. 0d4d199 [Josh Rosen] Bump up shuffle.memoryFraction to make tests pass. b3b1924 [Josh Rosen] Properly implement close() and flush() in DummySerializerInstance. 1ef56c7 [Josh Rosen] Revise compression codec support in merger; test cross product of configurations. b57c17f [Josh Rosen] Disable some overly-verbose logs that rendered DEBUG useless. f780fb1 [Josh Rosen] Add test demonstrating which compression codecs support concatenation. 4a01c45 [Josh Rosen] Remove unnecessary log message 27b18b0 [Josh Rosen] That for inserting records AT the max record size. fcd9a3c [Josh Rosen] Add notes + tests for maximum record / page sizes. 9d1ee7c [Josh Rosen] Fix MiMa excludes for ShuffleWriter change fd4bb9e [Josh Rosen] Use own ByteBufferOutputStream rather than Kryo's 67d25ba [Josh Rosen] Update Exchange operator's copying logic to account for new shuffle manager 8f5061a [Josh Rosen] Strengthen assertion to check partitioning 01afc74 [Josh Rosen] Actually read data in UnsafeShuffleWriterSuite 1929a74 [Josh Rosen] Update to reflect upstream ShuffleBlockManager -> ShuffleBlockResolver rename. e8718dd [Josh Rosen] Merge remote-tracking branch 'origin/master' into unsafe-sort 9b7ebed [Josh Rosen] More defensive programming RE: cleaning up spill files and memory after errors 7cd013b [Josh Rosen] Begin refactoring to enable proper tests for spilling. 722849b [Josh Rosen] Add workaround for transferTo() bug in merging code; refactor tests. 9883e30 [Josh Rosen] Merge remote-tracking branch 'origin/master' into unsafe-sort b95e642 [Josh Rosen] Refactor and document logic that decides when to spill. 1ce1300 [Josh Rosen] More minor cleanup 5e8cf75 [Josh Rosen] More minor cleanup e67f1ea [Josh Rosen] Remove upper type bound in ShuffleWriter interface. cfe0ec4 [Josh Rosen] Address a number of minor review comments: 8a6fe52 [Josh Rosen] Rename UnsafeShuffleSpillWriter to UnsafeShuffleExternalSorter 11feeb6 [Josh Rosen] Update TODOs related to shuffle write metrics. b674412 [Josh Rosen] Merge remote-tracking branch 'origin/master' into unsafe-sort aaea17b [Josh Rosen] Add comments to UnsafeShuffleSpillWriter. 4f70141 [Josh Rosen] Fix merging; now passes UnsafeShuffleSuite tests. 133c8c9 [Josh Rosen] WIP towards testing UnsafeShuffleWriter. f480fb2 [Josh Rosen] WIP in mega-refactoring towards shuffle-specific sort. 57f1ec0 [Josh Rosen] WIP towards packed record pointers for use in optimized shuffle sort. 69232fd [Josh Rosen] Enable compressible address encoding for off-heap mode. 7ee918e [Josh Rosen] Re-order imports in tests 3aeaff7 [Josh Rosen] More refactoring and cleanup; begin cleaning iterator interfaces 3490512 [Josh Rosen] Misc. cleanup f156a8f [Josh Rosen] Hacky metrics integration; refactor some interfaces. 2776aca [Josh Rosen] First passing test for ExternalSorter. 5e100b2 [Josh Rosen] Super-messy WIP on external sort 595923a [Josh Rosen] Remove some unused variables. 8958584 [Josh Rosen] Fix bug in calculating free space in current page. f17fa8f [Josh Rosen] Add missing newline c2fca17 [Josh Rosen] Small refactoring of SerializerPropertiesSuite to enable test re-use: b8a09fe [Josh Rosen] Back out accidental log4j.properties change bfc12d3 [Josh Rosen] Add tests for serializer relocation property. 240864c [Josh Rosen] Remove PrefixComputer and require prefix to be specified as part of insert() 1433b42 [Josh Rosen] Store record length as int instead of long. 026b497 [Josh Rosen] Re-use a buffer in UnsafeShuffleWriter 0748458 [Josh Rosen] Port UnsafeShuffleWriter to Java. 87e721b [Josh Rosen] Renaming and comments d3cc310 [Josh Rosen] Flag that SparkSqlSerializer2 supports relocation e2d96ca [Josh Rosen] Expand serializer API and use new function to help control when new UnsafeShuffle path is used. e267cee [Josh Rosen] Fix compilation of UnsafeSorterSuite 9c6cf58 [Josh Rosen] Refactor to use DiskBlockObjectWriter. 253f13e [Josh Rosen] More cleanup 8e3ec20 [Josh Rosen] Begin code cleanup. 4d2f5e1 [Josh Rosen] WIP 3db12de [Josh Rosen] Minor simplification and sanity checks in UnsafeSorter 767d3ca [Josh Rosen] Fix invalid range in UnsafeSorter. e900152 [Josh Rosen] Add test for empty iterator in UnsafeSorter 57a4ea0 [Josh Rosen] Make initialSize configurable in UnsafeSorter abf7bfe [Josh Rosen] Add basic test case. 81d52c5 [Josh Rosen] WIP on UnsafeSorter
* [SPARK-2018] [CORE] Upgrade LZF library to fix endian serialization p…Tim Ellison2015-05-121-1/+1
| | | | | | | | | | | | | | | | …roblem Pick up newer version of dependency with fix for SPARK-2018. The update involved patching the ning/compress LZF library to handle big endian systems correctly. Credit goes to gireeshpunathil for diagnosing the problem, and cowtowncoder for fixing it. Spark tests run clean for me. Author: Tim Ellison <t.p.ellison@gmail.com> Closes #6077 from tellison/UpgradeLZF and squashes the following commits: ad8d4ef [Tim Ellison] [SPARK-2018] [CORE] Upgrade LZF library to fix endian serialization problem
* [SPARK-3454] separate json endpoints for data in the UIImran Rashid2015-05-081-0/+12
| | | | | | | | | | | | | | | | | | | | Exposes data available in the UI as json over http. Key points: * new endpoints, handled independently of existing XyzPage classes. Root entrypoint is `JsonRootResource` * Uses jersey + jackson for routing & converting POJOs into json * tests against known results in `HistoryServerSuite` * also fixes some minor issues w/ the UI -- synchronizing on access to `StorageListener` & `StorageStatusListener`, and fixing some inconsistencies w/ the way we handle retained jobs & stages. Author: Imran Rashid <irashid@cloudera.com> Closes #5940 from squito/SPARK-3454_better_test_files and squashes the following commits: 1a72ed6 [Imran Rashid] rats 85fdb3e [Imran Rashid] Merge branch 'no_php' into SPARK-3454 1fc65b0 [Imran Rashid] Revert "Revert "[SPARK-3454] separate json endpoints for data in the UI"" 1276900 [Imran Rashid] get rid of giant event file, replace w/ smaller one; check both shuffle read & shuffle write 4e12013 [Imran Rashid] just use test case name for expectation file name 863ef64 [Imran Rashid] rename json files to avoid strange file names and not look like php
* Revert "[SPARK-3454] separate json endpoints for data in the UI"Reynold Xin2015-05-051-12/+0
| | | | | | This reverts commit d49735800db27239c11478aac4b0f2ec9df91a3f. The commit broke Spark on Windows.
* [SPARK-3454] separate json endpoints for data in the UIImran Rashid2015-05-051-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Exposes data available in the UI as json over http. Key points: * new endpoints, handled independently of existing XyzPage classes. Root entrypoint is `JsonRootResource` * Uses jersey + jackson for routing & converting POJOs into json * tests against known results in `HistoryServerSuite` * also fixes some minor issues w/ the UI -- synchronizing on access to `StorageListener` & `StorageStatusListener`, and fixing some inconsistencies w/ the way we handle retained jobs & stages. Author: Imran Rashid <irashid@cloudera.com> Closes #4435 from squito/SPARK-3454 and squashes the following commits: da1e35f [Imran Rashid] typos etc. 5e78b4f [Imran Rashid] fix rendering problems 5ae02ad [Imran Rashid] Merge branch 'master' into SPARK-3454 f016182 [Imran Rashid] change all constructors json-pojo class constructors to be private[spark] to protect us from mima-false-positives if we add fields 3347b72 [Imran Rashid] mark EnumUtil as @Private ec140a2 [Imran Rashid] create @Private cc1febf [Imran Rashid] add docs on the metrics-as-json api cbaf287 [Imran Rashid] Merge branch 'master' into SPARK-3454 56db31e [Imran Rashid] update tests for mulit-attempt 7f3bc4e [Imran Rashid] Revert "add sbt-revolved plugin, to make it easier to start & stop http servers in sbt" 67008b4 [Imran Rashid] rats 9e51400 [Imran Rashid] style c9bae1c [Imran Rashid] handle multiple attempts per app b87cd63 [Imran Rashid] add sbt-revolved plugin, to make it easier to start & stop http servers in sbt 188762c [Imran Rashid] multi-attempt 2af11e5 [Imran Rashid] Merge branch 'master' into SPARK-3454 befff0c [Imran Rashid] review feedback 14ac3ed [Imran Rashid] jersey-core needs to be explicit; move version & scope to parent pom.xml f90680e [Imran Rashid] Merge branch 'master' into SPARK-3454 dc8a7fe [Imran Rashid] style, fix errant comments acb7ef6 [Imran Rashid] fix indentation 7bf1811 [Imran Rashid] move MetricHelper so mima doesnt think its exposed; comments 9d889d6 [Imran Rashid] undo some unnecessary changes f48a7b0 [Imran Rashid] docs 52bbae8 [Imran Rashid] StorageListener & StorageStatusListener needs to synchronize internally to be thread-safe 31c79ce [Imran Rashid] asm no longer needed for SPARK_PREPEND_CLASSES b2f8b91 [Imran Rashid] @DeveloperApi 2e19be2 [Imran Rashid] lazily convert ApplicationInfo to avoid memory overhead ba3d9d2 [Imran Rashid] upper case enums 39ac29c [Imran Rashid] move EnumUtil d2bde77 [Imran Rashid] update error handling & scoping 4a234d3 [Imran Rashid] avoid jersey-media-json-jackson b/c of potential version conflicts a157a2f [Imran Rashid] style 7bd4d15 [Imran Rashid] delete security test, since it doesnt do anything a325563 [Imran Rashid] style a9c5cf1 [Imran Rashid] undo changes superceeded by master 0c6f968 [Imran Rashid] update deps 1ed0d07 [Imran Rashid] Merge branch 'master' into SPARK-3454 4c92af6 [Imran Rashid] style f2e63ad [Imran Rashid] Merge branch 'master' into SPARK-3454 c22b11f [Imran Rashid] fix compile error 9ea682c [Imran Rashid] go back to good ol' java enums cf86175 [Imran Rashid] style d493b38 [Imran Rashid] Merge branch 'master' into SPARK-3454 f05ae89 [Imran Rashid] add in ExecutorSummaryInfo for MiMa :( 101a698 [Imran Rashid] style d2ef58d [Imran Rashid] revert changes that had HistoryServer refresh the application listing more often b136e39b [Imran Rashid] Revert "add sbt-revolved plugin, to make it easier to start & stop http servers in sbt" e031719 [Imran Rashid] fixes from review 1f53a66 [Imran Rashid] style b4a7863 [Imran Rashid] fix compile error 2c8b7ee [Imran Rashid] rats 1578a4a [Imran Rashid] doc 674f8dc [Imran Rashid] more explicit about total numbers of jobs & stages vs. number retained 9922be0 [Imran Rashid] Merge branch 'master' into stage_distributions f5a5196 [Imran Rashid] undo removal of renderJson from MasterPage, since there is no substitute yet db61211 [Imran Rashid] get JobProgressListener directly from UI fdfc181 [Imran Rashid] stage/taskList 63eb4a6 [Imran Rashid] tests for taskSummary ad27de8 [Imran Rashid] error handling on quantile values b2efcaf [Imran Rashid] cleanup, combine stage-related paths into one resource aaba896 [Imran Rashid] wire up task summary a4b1397 [Imran Rashid] stage metric distributions e48ba32 [Imran Rashid] rename eaf3bbb [Imran Rashid] style 25cd894 [Imran Rashid] if only given day, assume GMT 51eaedb [Imran Rashid] more visibility fixes 9f28b7e [Imran Rashid] ack, more cleanup 99764e1 [Imran Rashid] Merge branch 'SPARK-3454_w_jersey' into SPARK-3454 a61a43c [Imran Rashid] oops, remove accidental checkin a066055 [Imran Rashid] set visibility on a lot of classes 1f361c8 [Imran Rashid] update rat-excludes 0be5120 [Imran Rashid] Merge branch 'master' into SPARK-3454_w_jersey 2382bef [Imran Rashid] switch to using new "enum" fef6605 [Imran Rashid] some utils for working w/ new "enum" format dbfc7bf [Imran Rashid] style b86bcb0 [Imran Rashid] update test to look at one stage attempt 5f9df24 [Imran Rashid] style 7fd156a [Imran Rashid] refactor jsonDiff to avoid code duplication 73f1378 [Imran Rashid] test json; also add test cases for cleaned stages & jobs 97d411f [Imran Rashid] json endpoint for one job 0c96147 [Imran Rashid] better error msgs for bad stageId vs bad attemptId dddbd29 [Imran Rashid] stages have attempt; jobs are sorted; resource for all attempts for one stage 190c17a [Imran Rashid] StagePage should distinguish no task data, from unknown stage 84cd497 [Imran Rashid] AllJobsPage should still report correct completed & failed job count, even if some have been cleaned, to make it consistent w/ AllStagesPage 36e4062 [Imran Rashid] SparkUI needs to know about startTime, so it can list its own applicationInfo b4c75ed [Imran Rashid] fix merge conflicts; need to widen visibility in a few cases e91750a [Imran Rashid] Merge branch 'master' into SPARK-3454_w_jersey 56d2fc7 [Imran Rashid] jersey needs asm for SPARK_PREPEND_CLASSES to work f7df095 [Imran Rashid] add test for accumulables, and discover that I need update after all 9c0c125 [Imran Rashid] add accumulableInfo 00e9cc5 [Imran Rashid] more style 3377e61 [Imran Rashid] scaladoc d05f7a9 [Imran Rashid] dont use case classes for status api POJOs, since they have binary compatibility issues 654cecf [Imran Rashid] move all the status api POJOs to one file b86e2b0 [Imran Rashid] style 18a8c45 [Imran Rashid] Merge branch 'master' into SPARK-3454_w_jersey 5598f19 [Imran Rashid] delete some unnecessary code, more to go 56edce0 [Imran Rashid] style 017c755 [Imran Rashid] add in metrics now available 1b78cb7 [Imran Rashid] fix some import ordering 0dc3ea7 [Imran Rashid] if app isnt found, reload apps from FS before giving up c7d884f [Imran Rashid] fix merge conflicts 0c12b50 [Imran Rashid] Merge branch 'master' into SPARK-3454_w_jersey b6a96a8 [Imran Rashid] compare json by AST, not string cd37845 [Imran Rashid] switch to using java.util.Dates for times a4ab5aa [Imran Rashid] add in explicit dependency on jersey 1.9 -- maven wasn't happy before this 4fdc39f [Imran Rashid] refactor case insensitive enum parsing cba1ef6 [Imran Rashid] add security (maybe?) for metrics json f0264a7 [Imran Rashid] switch to using jersey for metrics json bceb3a9 [Imran Rashid] set http response code on error, some testing e0356b6 [Imran Rashid] put new test expectation files in rat excludes (is this OK?) b252e7a [Imran Rashid] small cleanup of accidental changes d1a8c92 [Imran Rashid] add sbt-revolved plugin, to make it easier to start & stop http servers in sbt 4b398d0 [Imran Rashid] expose UI data as json in new endpoints
* [MINOR] [BUILD] Declare ivy dependency in root pom.Marcelo Vanzin2015-05-051-0/+5
| | | | | | | | | | | | | Without this, any dependency that pulls ivy transitively may override the version and potentially cause issue. In my machine, the hive tests were pulling an old version of ivy, and subsequently failing with a "NoSuchMethodError". Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #5893 from vanzin/ivy-dep-fix and squashes the following commits: ea2112d [Marcelo Vanzin] [minor] [build] Declare ivy dependency in root pom.
* [SPARK-7302] [DOCS] SPARK building documentation still mentions building for ↵Sean Owen2015-05-031-14/+0
| | | | | | | | | | | | | | | | | | yarn 0.23 Remove references to Hadoop 0.23 CC tgravescs Is this what you had in mind? basically all refs to 0.23? We don't support YARN 0.23, but also don't support Hadoop 0.23 anymore AFAICT. There are no builds or releases for it. In fact, on a related note, refs to CDH3 (Hadoop 0.20.2) should be removed as this certainly isn't supported either. Author: Sean Owen <sowen@cloudera.com> Closes #5863 from srowen/SPARK-7302 and squashes the following commits: 42f5d1e [Sean Owen] Remove CDH3 (Hadoop 0.20.2) refs too dad02e3 [Sean Owen] Remove references to Hadoop 0.23
* [SPARK-2691] [MESOS] Support for Mesos DockerInfoChris Heller2015-05-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds partial support for running spark on mesos inside of a docker container. Only fine-grained mode is presently supported, and there is no checking done to ensure that the version of libmesos is recent enough to have a DockerInfo structure in the protobuf (other than pinning a mesos version in the pom.xml). Author: Chris Heller <hellertime@gmail.com> Closes #3074 from hellertime/SPARK-2691 and squashes the following commits: d504af6 [Chris Heller] Assist type inference f64885d [Chris Heller] Fix errant line length 17c41c0 [Chris Heller] Base Dockerfile on mesosphere/mesos image 8aebda4 [Chris Heller] Simplfy Docker image docs 1ae7f4f [Chris Heller] Style points 974bd56 [Chris Heller] Convert map to flatMap 5d8bdf7 [Chris Heller] Factor out the DockerInfo construction. 7b75a3d [Chris Heller] Align to styleguide 80108e7 [Chris Heller] Bend to the will of RAT ba77056 [Chris Heller] Explicit RAT exclude abda5e5 [Chris Heller] Wildcard .rat-excludes 2f2873c [Chris Heller] Exclude spark-mesos from RAT a589a5b [Chris Heller] Add example Dockerfile b6825ce [Chris Heller] Remove use of EasyMock eae1b86 [Chris Heller] Move properties under 'spark.mesos.' c184d00 [Chris Heller] Use map on Option to be consistent with non-coarse code fb9501a [Chris Heller] Bumped mesos version to current release fa11879 [Chris Heller] Add listenerBus to EasyMock 882151e [Chris Heller] Changes to scala style b22d42d [Chris Heller] Exclude template from RAT db536cf [Chris Heller] Remove unneeded mocks dea1bd5 [Chris Heller] Force default protocol 7dac042 [Chris Heller] Add test for DockerInfo 5456c0c [Chris Heller] Adjust syntax style 521c194 [Chris Heller] Adjust version info 6e38f70 [Chris Heller] Document Mesos Docker properties 29572ab [Chris Heller] Support all DockerInfo fields b8c0dea [Chris Heller] Support for mesos DockerInfo in coarse-mode. 482a9fd [Chris Heller] Support for mesos DockerInfo in fine-grained mode.
* [SPARK-7076][SPARK-7077][SPARK-7080][SQL] Use managed memory for aggregationsJosh Rosen2015-04-291-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds managed-memory-based aggregation to Spark SQL / DataFrames. Instead of working with Java objects, this new aggregation path uses `sun.misc.Unsafe` to manipulate raw memory. This reduces the memory footprint for aggregations, resulting in fewer spills, OutOfMemoryErrors, and garbage collection pauses. As a result, this allows for higher memory utilization. It can also result in better cache locality since objects will be stored closer together in memory. This feature can be eanbled by setting `spark.sql.unsafe.enabled=true`. For now, this feature is only supported when codegen is enabled and only supports aggregations for which the grouping columns are primitive numeric types or strings and aggregated values are numeric. ### Managing memory with sun.misc.Unsafe This patch supports both on- and off-heap managed memory. - In on-heap mode, memory addresses are identified by the combination of a base Object and an offset within that object. - In off-heap mode, memory is addressed directly with 64-bit long addresses. To support both modes, functions that manipulate memory accept both `baseObject` and `baseOffset` fields. In off-heap mode, we simply pass `null` as `baseObject`. We allocate memory in large chunks, so memory fragmentation and allocation speed are not significant bottlenecks. By default, we use on-heap mode. To enable off-heap mode, set `spark.unsafe.offHeap=true`. To track allocated memory, this patch extends `SparkEnv` with an `ExecutorMemoryManager` and supplies each `TaskContext` with a `TaskMemoryManager`. These classes work together to track allocations and detect memory leaks. ### Compact tuple format This patch introduces `UnsafeRow`, a compact row layout. In this format, each tuple has three parts: a null bit set, fixed length values, and variable-length values: ![image](https://cloud.githubusercontent.com/assets/50748/7328538/2fdb65ce-ea8b-11e4-9743-6c0f02bb7d1f.png) - Rows are always 8-byte word aligned (so their sizes will always be a multiple of 8 bytes) - The bit set is used for null tracking: - Position _i_ is set if and only if field _i_ is null - The bit set is aligned to an 8-byte word boundary. - Every field appears as an 8-byte word in the fixed-length values part: - If a field is null, we zero out the values. - If a field is variable-length, the word stores a relative offset (w.r.t. the base of the tuple) that points to the beginning of the field's data in the variable-length part. - Each variable-length data type can have its own encoding: - For strings, the first word stores the length of the string and is followed by UTF-8 encoded bytes. If necessary, the end of the string is padded with empty bytes in order to ensure word-alignment. For example, a tuple that consists 3 fields of type (int, string, string), with value (null, “data”, “bricks”) would look like this: ![image](https://cloud.githubusercontent.com/assets/50748/7328526/1e21959c-ea8b-11e4-9a28-a4350fe4a7b5.png) This format allows us to compare tuples for equality by directly comparing their raw bytes. This also enables fast hashing of tuples. ### Hash map for performing aggregations This patch introduces `UnsafeFixedWidthAggregationMap`, a hash map for performing aggregations where the aggregation result columns are fixed-with. This map's keys and values are `Row` objects. `UnsafeFixedWidthAggregationMap` is implemented on top of `BytesToBytesMap`, an append-only map which supports byte-array keys and values. `BytesToBytesMap` stores pointers to key and value tuples. For each record with a new key, we copy the key and create the aggregation value buffer for that key and put them in a buffer. The hash table then simply stores pointers to the key and value. For each record with an existing key, we simply run the aggregation function to update the values in place. This map is implemented using open hashing with triangular sequence probing. Each entry stores two words in a long array: the first word stores the address of the key and the second word stores the relative offset from the key tuple to the value tuple, as well as the key's 32-bit hashcode. By storing the full hashcode, we reduce the number of equality checks that need to be performed to handle position collisions ()since the chance of hashcode collision is much lower than position collision). `UnsafeFixedWidthAggregationMap` allows regular Spark SQL `Row` objects to be used when probing the map. Internally, it encodes these rows into `UnsafeRow` format using `UnsafeRowConverter`. This conversion has a small overhead that can be eliminated in the future once we use UnsafeRows in other operators. <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/5725) <!-- Reviewable:end --> Author: Josh Rosen <joshrosen@databricks.com> Closes #5725 from JoshRosen/unsafe and squashes the following commits: eeee512 [Josh Rosen] Add converters for Null, Boolean, Byte, and Short columns. 81f34f8 [Josh Rosen] Follow 'place children last' convention for GeneratedAggregate 1bc36cc [Josh Rosen] Refactor UnsafeRowConverter to avoid unnecessary boxing. 017b2dc [Josh Rosen] Remove BytesToBytesMap.finalize() 50e9671 [Josh Rosen] Throw memory leak warning even in case of error; add warning about code duplication 70a39e4 [Josh Rosen] Split MemoryManager into ExecutorMemoryManager and TaskMemoryManager: 6e4b192 [Josh Rosen] Remove an unused method from ByteArrayMethods. de5e001 [Josh Rosen] Fix debug vs. trace in logging message. a19e066 [Josh Rosen] Rename unsafe Java test suites to match Scala test naming convention. 78a5b84 [Josh Rosen] Add logging to MemoryManager ce3c565 [Josh Rosen] More comments, formatting, and code cleanup. 529e571 [Josh Rosen] Measure timeSpentResizing in nanoseconds instead of milliseconds. 3ca84b2 [Josh Rosen] Only zero the used portion of groupingKeyConversionScratchSpace 162caf7 [Josh Rosen] Fix test compilation b45f070 [Josh Rosen] Don't redundantly store the offset from key to value, since we can compute this from the key size. a8e4a3f [Josh Rosen] Introduce MemoryManager interface; add to SparkEnv. 0925847 [Josh Rosen] Disable MiMa checks for new unsafe module cde4132 [Josh Rosen] Add missing pom.xml 9c19fc0 [Josh Rosen] Add configuration options for heap vs. offheap 6ffdaa1 [Josh Rosen] Null handling improvements in UnsafeRow. 31eaabc [Josh Rosen] Lots of TODO and doc cleanup. a95291e [Josh Rosen] Cleanups to string handling code afe8dca [Josh Rosen] Some Javadoc cleanup f3dcbfe [Josh Rosen] More mod replacement 854201a [Josh Rosen] Import and comment cleanup 06e929d [Josh Rosen] More warning cleanup ef6b3d3 [Josh Rosen] Fix a bunch of FindBugs and IntelliJ inspections 29a7575 [Josh Rosen] Remove debug logging 49aed30 [Josh Rosen] More long -> int conversion. b26f1d3 [Josh Rosen] Fix bug in murmur hash implementation. 765243d [Josh Rosen] Enable optional performance metrics for hash map. 23a440a [Josh Rosen] Bump up default hash map size 628f936 [Josh Rosen] Use ints intead of longs for indexing. 92d5a06 [Josh Rosen] Address a number of minor code review comments. 1f4b716 [Josh Rosen] Merge Unsafe code into the regular GeneratedAggregate, guarded by a configuration flag; integrate planner support and re-enable all tests. d85eeff [Josh Rosen] Add basic sanity test for UnsafeFixedWidthAggregationMap bade966 [Josh Rosen] Comment update (bumping to refresh GitHub cache...) b3eaccd [Josh Rosen] Extract aggregation map into its own class. d2bb986 [Josh Rosen] Update to implement new Row methods added upstream 58ac393 [Josh Rosen] Use UNSAFE allocator in GeneratedAggregate (TODO: make this configurable) 7df6008 [Josh Rosen] Optimizations related to zeroing out memory: c1b3813 [Josh Rosen] Fix bug in UnsafeMemoryAllocator.free(): 738fa33 [Josh Rosen] Add feature flag to guard UnsafeGeneratedAggregate c55bf66 [Josh Rosen] Free buffer once iterator has been fully consumed. 62ab054 [Josh Rosen] Optimize for fact that get() is only called on String columns. c7f0b56 [Josh Rosen] Reuse UnsafeRow pointer in UnsafeRowConverter ae39694 [Josh Rosen] Add finalizer as "cleanup method of last resort" c754ae1 [Josh Rosen] Now that the store*() contract has been stregthened, we can remove an extra lookup f764d13 [Josh Rosen] Simplify address + length calculation in Location. 079f1bf [Josh Rosen] Some clarification of the BytesToBytesMap.lookup() / set() contract. 1a483c5 [Josh Rosen] First version that passes some aggregation tests: fc4c3a8 [Josh Rosen] Sketch how the converters will be used in UnsafeGeneratedAggregate 53ba9b7 [Josh Rosen] Start prototyping Java Row -> UnsafeRow converters 1ff814d [Josh Rosen] Add reminder to free memory on iterator completion 8a8f9df [Josh Rosen] Add skeleton for GeneratedAggregate integration. 5d55cef [Josh Rosen] Add skeleton for Row implementation. f03e9c1 [Josh Rosen] Play around with Unsafe implementations of more string methods. ab68e08 [Josh Rosen] Begin merging the UTF8String implementations. 480a74a [Josh Rosen] Initial import of code from Databricks unsafe utils repo.
* [SPARK-7168] [BUILD] Update plugin versions in Maven build and centralize ↵Sean Owen2015-04-281-12/+32
| | | | | | | | | | | | | versions Update Maven build plugin versions and centralize plugin version management Author: Sean Owen <sowen@cloudera.com> Closes #5720 from srowen/SPARK-7168 and squashes the following commits: 98a8947 [Sean Owen] Make install, deploy plugin versions explicit 4ecf3b2 [Sean Owen] Update Maven build plugin versions and centralize plugin version management
* [SPARK-7092] Update spark scala version to 2.11.6Prashant Sharma2015-04-251-2/+2
| | | | | | | | Author: Prashant Sharma <prashant.s@imaginea.com> Closes #5662 from ScrapCodes/SPARK-7092/scala-update-2.11.6 and squashes the following commits: 58cf4f9 [Prashant Sharma] [SPARK-7092] Update spark scala version to 2.11.6
* [SPARK-6122] [CORE] Upgrade tachyon-client version to 0.6.3Calvin Jia2015-04-241-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a reopening of #4867. A short summary of the issues resolved from the previous PR: 1. HTTPClient version mismatch: Selenium (used for UI tests) requires version 4.3.x, and Tachyon included 4.2.5 through a transitive dependency of its shaded thrift jar. To address this, Tachyon 0.6.3 will promote the transitive dependencies of the shaded jar so they can be excluded in spark. 2. Jackson-Mapper-ASL version mismatch: In lower versions of hadoop-client (ie. 1.0.4), version 1.0.1 is included. The parquet library used in spark sql requires version 1.8+. Its unclear to me why upgrading tachyon-client would cause this dependency to break. The solution was to exclude jackson-mapper-asl from hadoop-client. It seems that the dependency management in spark-parent will not work on transitive dependencies, one way to make sure jackson-mapper-asl is included with the correct version is to add it as a top level dependency. The best solution would be to exclude the dependency in the modules which require a higher version, but that did not fix the unit tests. Any suggestions on the best way to solve this would be appreciated! Author: Calvin Jia <jia.calvin@gmail.com> Closes #5354 from calvinjia/upgrade_tachyon_0.6.3 and squashes the following commits: 0eefe4d [Calvin Jia] Handle httpclient version in maven dependency management. Remove httpclient version setting from profiles. 7c00dfa [Calvin Jia] Set httpclient version to 4.3.2 for selenium. Specify version of httpclient for sql/hive (previously 4.2.5 transitive dependency of libthrift). 9263097 [Calvin Jia] Merge master to test latest changes dbfc1bd [Calvin Jia] Use Tachyon 0.6.4 for cleaner dependencies. e2ff80a [Calvin Jia] Exclude the jetty and curator promoted dependencies from tachyon-client. a3a29da [Calvin Jia] Update tachyon-client exclusions. 0ae6c97 [Calvin Jia] Change tachyon version to 0.6.3 a204df9 [Calvin Jia] Update make distribution tachyon version. a93c94f [Calvin Jia] Exclude jackson-mapper-asl from hadoop client since it has a lower version than spark's expected version. a8a923c [Calvin Jia] Exclude httpcomponents from Tachyon 910fabd [Calvin Jia] Update to master eed9230 [Calvin Jia] Update tachyon version to 0.6.1. 11907b3 [Calvin Jia] Use TachyonURI for tachyon paths instead of strings. 71bf441 [Calvin Jia] Upgrade Tachyon client version to 0.6.0.
* SPARK-6861 [BUILD] Scalastyle config prevents building Maven child modules aloneSean Owen2015-04-151-3/+2
| | | | | | | | | | | | Move scalastyle-config.xml to dev/ (SBT config still doesn't work) to fix running mvn targets from subdirs; make scalastyle a verify stage target again in Maven; output results in target not project root; update to scalastyle 0.7.0 Author: Sean Owen <sowen@cloudera.com> Closes #5471 from srowen/SPARK-6861 and squashes the following commits: acac637 [Sean Owen] Oops, add back execution but leave it at the default verify phase 35a4fd2 [Sean Owen] Revert change to scalastyle-config.xml location, but return scalastyle Maven check to verify phase instead of package to get it farther out of the way, since the Maven invocation is optional c4fb42c [Sean Owen] Move scalastyle-config.xml to dev/ (SBT config still doesn't work) to fix running mvn targets from subdirs; make scalastyle a verify stage target again in Maven; output results in target not project root; update to scalastyle 0.7.0
* [SPARK-6905] Upgrade to snappy-java 1.1.1.7Josh Rosen2015-04-141-1/+1
| | | | | | | | | | We should upgrade our snappy-java dependency to 1.1.1.7 in order to include a fix for a bug that results in worse compression in SnappyOutputStream (see https://github.com/xerial/snappy-java/issues/100). Author: Josh Rosen <joshrosen@databricks.com> Closes #5512 from JoshRosen/snappy-1.1.1.7 and squashes the following commits: f1ac0f8 [Josh Rosen] Upgrade to snappy-java 1.1.1.7.
* [SPARK-6731] Bump version of apache commons-math3Punyashloka Biswal2015-04-141-1/+1
| | | | | | | | | | | Version 3.1.1 is two years old and the newer version includes approximate percentile statistics (among other things). Author: Punyashloka Biswal <punya.biswal@gmail.com> Closes #5380 from punya/patch-1 and squashes the following commits: 226622b [Punyashloka Biswal] Bump version of apache commons-math3