aboutsummaryrefslogtreecommitdiff
path: root/dev
diff options
context:
space:
mode:
authorMichael Armbrust <michael@databricks.com>2015-05-07 19:36:24 -0700
committerYin Huai <yhuai@databricks.com>2015-05-07 19:36:24 -0700
commitcd1d4110cfffb413ab585cf1cc8f1264243cb393 (patch)
tree1cc87432cdf30f96b12756babfc7242b2573bea4 /dev
parent22ab70e06ede65ca865073fe36c859042a920aa3 (diff)
downloadspark-cd1d4110cfffb413ab585cf1cc8f1264243cb393.tar.gz
spark-cd1d4110cfffb413ab585cf1cc8f1264243cb393.tar.bz2
spark-cd1d4110cfffb413ab585cf1cc8f1264243cb393.zip
[SPARK-6908] [SQL] Use isolated Hive client
This PR switches Spark SQL's Hive support to use the isolated hive client interface introduced by #5851, instead of directly interacting with the client. By using this isolated client we can now allow users to dynamically configure the version of Hive that they are connecting to by setting `spark.sql.hive.metastore.version` without the need recompile. This also greatly reduces the surface area for our interaction with the hive libraries, hopefully making it easier to support other versions in the future. Jars for the desired hive version can be configured using `spark.sql.hive.metastore.jars`, which accepts the following options: - a colon-separated list of jar files or directories for hive and hadoop. - `builtin` - attempt to discover the jars that were used to load Spark SQL and use those. This option is only valid when using the execution version of Hive. - `maven` - download the correct version of hive on demand from maven. By default, `builtin` is used for Hive 13. This PR also removes the test step for building against Hive 12, as this will no longer be required to talk to Hive 12 metastores. However, the full removal of the Shim is deferred until a later PR. Remaining TODOs: - Remove the Hive Shims and inline code for Hive 13. - Several HiveCompatibility tests are not yet passing. - `nullformatCTAS` - As detailed below, we now are handling CTAS parsing ourselves instead of hacking into the Hive semantic analyzer. However, we currently only handle the common cases and not things like CTAS where the null format is specified. - `combine1` now leaks state about compression somehow, breaking all subsequent tests. As such we currently add it to the blacklist - `part_inherit_tbl_props` and `part_inherit_tbl_props_with_star` do not work anymore. We are correctly propagating the information - "load_dyn_part14.*" - These tests pass when run on their own, but fail when run with all other tests. It seems our `RESET` mechanism may not be as robust as it used to be? Other required changes: - `CreateTableAsSelect` no longer carries parts of the HiveQL AST with it through the query execution pipeline. Instead, we parse CTAS during the HiveQL conversion and construct a `HiveTable`. The full parsing here is not yet complete as detailed above in the remaining TODOs. Since the operator is Hive specific, it is moved to the hive package. - `Command` is simplified to be a trait that simply acts as a marker for a LogicalPlan that should be eagerly evaluated. Author: Michael Armbrust <michael@databricks.com> Closes #5876 from marmbrus/useIsolatedClient and squashes the following commits: 258d000 [Michael Armbrust] really really correct path handling e56fd4a [Michael Armbrust] getAbsolutePath 5a259f5 [Michael Armbrust] fix typos 81bb366 [Michael Armbrust] comments from vanzin 5f3945e [Michael Armbrust] Merge remote-tracking branch 'origin/master' into useIsolatedClient 4b5cd41 [Michael Armbrust] yin's comments f5de7de [Michael Armbrust] cleanup 11e9c72 [Michael Armbrust] better coverage in versions suite 7e8f010 [Michael Armbrust] better error messages and jar handling e7b3941 [Michael Armbrust] more permisive checking for function registration da91ba7 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into useIsolatedClient 5fe5894 [Michael Armbrust] fix serialization suite 81711c4 [Michael Armbrust] Initial support for running without maven 1d8ae44 [Michael Armbrust] fix final tests? 1c50813 [Michael Armbrust] more comments a3bee70 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into useIsolatedClient a6f5df1 [Michael Armbrust] style ab07f7e [Michael Armbrust] WIP 4d8bf02 [Michael Armbrust] Remove hive 12 compilation 8843a25 [Michael Armbrust] [SPARK-6908] [SQL] Use isolated Hive client
Diffstat (limited to 'dev')
-rwxr-xr-xdev/run-tests23
1 files changed, 0 insertions, 23 deletions
diff --git a/dev/run-tests b/dev/run-tests
index 05c63bce4d..ef587a1a59 100755
--- a/dev/run-tests
+++ b/dev/run-tests
@@ -142,29 +142,6 @@ CURRENT_BLOCK=$BLOCK_BUILD
{
HIVE_BUILD_ARGS="$SBT_MAVEN_PROFILES_ARGS -Phive -Phive-thriftserver"
- HIVE_12_BUILD_ARGS="$HIVE_BUILD_ARGS -Phive-0.12.0"
-
- # First build with Hive 0.12.0 to ensure patches do not break the Hive 0.12.0 build
- echo "[info] Compile with Hive 0.12.0"
- [ -d "lib_managed" ] && rm -rf lib_managed
- echo "[info] Building Spark with these arguments: $HIVE_12_BUILD_ARGS"
-
- if [ "${AMPLAB_JENKINS_BUILD_TOOL}" == "maven" ]; then
- build/mvn $HIVE_12_BUILD_ARGS clean package -DskipTests
- else
- # NOTE: echo "q" is needed because sbt on encountering a build file with failure
- # (either resolution or compilation) prompts the user for input either q, r, etc
- # to quit or retry. This echo is there to make it not block.
- # NOTE: Do not quote $BUILD_MVN_PROFILE_ARGS or else it will be interpreted as a
- # single argument!
- # QUESTION: Why doesn't 'yes "q"' work?
- # QUESTION: Why doesn't 'grep -v -e "^\[info\] Resolving"' work?
- echo -e "q\n" \
- | build/sbt $HIVE_12_BUILD_ARGS clean hive/compile hive-thriftserver/compile \
- | grep -v -e "info.*Resolving" -e "warn.*Merging" -e "info.*Including"
- fi
-
- # Then build with default Hive version (0.13.1) because tests are based on this version
echo "[info] Compile with Hive 0.13.1"
[ -d "lib_managed" ] && rm -rf lib_managed
echo "[info] Building Spark with these arguments: $HIVE_BUILD_ARGS"