aboutsummaryrefslogtreecommitdiff
path: root/docs/building-spark.md
diff options
context:
space:
mode:
authorZhan Zhang <zhazhan@gmail.com>2014-10-24 11:03:17 -0700
committerMichael Armbrust <michael@databricks.com>2014-10-24 11:03:17 -0700
commit7c89a8f0c81ecf91dba34c1f44393f45845d438c (patch)
tree523149c7dda49cb193680bed53cc1a112c213fcc /docs/building-spark.md
parent0e886610eedd8ea24761cdcefa25ccedeca72dc8 (diff)
downloadspark-7c89a8f0c81ecf91dba34c1f44393f45845d438c.tar.gz
spark-7c89a8f0c81ecf91dba34c1f44393f45845d438c.tar.bz2
spark-7c89a8f0c81ecf91dba34c1f44393f45845d438c.zip
[SPARK-2706][SQL] Enable Spark to support Hive 0.13
Given that a lot of users are trying to use hive 0.13 in spark, and the incompatibility between hive-0.12 and hive-0.13 on the API level I want to propose following approach, which has no or minimum impact on existing hive-0.12 support, but be able to jumpstart the development of hive-0.13 and future version support. Approach: Introduce “hive-version” property, and manipulate pom.xml files to support different hive version at compiling time through shim layer, e.g., hive-0.12.0 and hive-0.13.1. More specifically, 1. For each different hive version, there is a very light layer of shim code to handle API differences, sitting in sql/hive/hive-version, e.g., sql/hive/v0.12.0 or sql/hive/v0.13.1 2. Add a new profile hive-default active by default, which picks up all existing configuration and hive-0.12.0 shim (v0.12.0) if no hive.version is specified. 3. If user specifies different version (currently only 0.13.1 by -Dhive.version = 0.13.1), hive-versions profile will be activated, which pick up hive-version specific shim layer and configuration, mainly the hive jars and hive-version shim, e.g., v0.13.1. 4. With this approach, nothing is changed with current hive-0.12 support. No change by default: sbt/sbt -Phive For example: sbt/sbt -Phive -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 assembly To enable hive-0.13: sbt/sbt -Dhive.version=0.13.1 For example: sbt/sbt -Dhive.version=0.13.1 -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 assembly Note that in hive-0.13, hive-thriftserver is not enabled, which should be fixed by other Jira, and we don’t need -Phive with -Dhive.version in building (probably we should use -Phive -Dhive.version=xxx instead after thrift server is also supported in hive-0.13.1). Author: Zhan Zhang <zhazhan@gmail.com> Author: zhzhan <zhazhan@gmail.com> Author: Patrick Wendell <pwendell@gmail.com> Closes #2241 from zhzhan/spark-2706 and squashes the following commits: 3ece905 [Zhan Zhang] minor fix 410b668 [Zhan Zhang] solve review comments cbb4691 [Zhan Zhang] change run-test for new options 0d4d2ed [Zhan Zhang] rebase 497b0f4 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 8fad1cf [Zhan Zhang] change the pom file and make hive-0.13.1 as the default ab028d1 [Zhan Zhang] rebase 4a2e36d [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 4cb1b93 [zhzhan] Merge pull request #1 from pwendell/pr-2241 b0478c0 [Patrick Wendell] Changes to simplify the build of SPARK-2706 2b50502 [Zhan Zhang] rebase a72c0d4 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark cb22863 [Zhan Zhang] correct the typo 20f6cf7 [Zhan Zhang] solve compatability issue f7912a9 [Zhan Zhang] rebase and solve review feedback 301eb4a [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 10c3565 [Zhan Zhang] address review comments 6bc9204 [Zhan Zhang] rebase and remove temparory repo d3aa3f2 [Zhan Zhang] Merge branch 'master' into spark-2706 cedcc6f [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 3ced0d7 [Zhan Zhang] rebase d9b981d [Zhan Zhang] rebase and fix error due to rollback adf4924 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 3dd50e8 [Zhan Zhang] solve conflicts and remove unnecessary implicts d10bf00 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark dc7bdb3 [Zhan Zhang] solve conflicts 7e0cc36 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark d7c3e1e [Zhan Zhang] Merge branch 'master' into spark-2706 68deb11 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark d48bd18 [Zhan Zhang] address review comments 3ee3b2b [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 57ea52e [Zhan Zhang] Merge branch 'master' into spark-2706 2b0d513 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 9412d24 [Zhan Zhang] address review comments f4af934 [Zhan Zhang] rebase 1ccd7cc [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 128b60b [Zhan Zhang] ignore 0.12.0 test cases for the time being af9feb9 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 5f5619f [Zhan Zhang] restructure the directory and different hive version support 05d3683 [Zhan Zhang] solve conflicts e4c1982 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark 94b4fdc [Zhan Zhang] Spark-2706: hive-0.13.1 support on spark 87ebf3b [Zhan Zhang] Merge branch 'master' into spark-2706 921e914 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark f896b2a [Zhan Zhang] Merge branch 'master' into spark-2706 789ea21 [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark cb53a2c [Zhan Zhang] Merge branch 'master' of https://github.com/apache/spark f6a8a40 [Zhan Zhang] revert ba14f28 [Zhan Zhang] test dbedff3 [Zhan Zhang] Merge remote-tracking branch 'upstream/master' 70964fe [Zhan Zhang] revert fe0f379 [Zhan Zhang] Merge branch 'master' of https://github.com/zhzhan/spark 70ffd93 [Zhan Zhang] revert 42585ec [Zhan Zhang] test 7d5fce2 [Zhan Zhang] test
Diffstat (limited to 'docs/building-spark.md')
-rw-r--r--docs/building-spark.md26
1 files changed, 17 insertions, 9 deletions
diff --git a/docs/building-spark.md b/docs/building-spark.md
index b2940ee402..11fd56c145 100644
--- a/docs/building-spark.md
+++ b/docs/building-spark.md
@@ -97,12 +97,20 @@ mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package
mvn -Pyarn-alpha -Phadoop-2.3 -Dhadoop.version=2.3.0 -Dyarn.version=0.23.7 -DskipTests clean package
{% endhighlight %}
+<!--- TODO: Update this when Hive 0.13 JDBC is added -->
+
# Building With Hive and JDBC Support
To enable Hive integration for Spark SQL along with its JDBC server and CLI,
-add the `-Phive` profile to your existing build options.
+add the `-Phive` profile to your existing build options. By default Spark
+will build with Hive 0.13.1 bindings. You can also build for Hive 0.12.0 using
+the `-Phive-0.12.0` profile. NOTE: currently the JDBC server is only
+supported for Hive 0.12.0.
{% highlight bash %}
-# Apache Hadoop 2.4.X with Hive support
+# Apache Hadoop 2.4.X with Hive 13 support
mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -DskipTests clean package
+
+# Apache Hadoop 2.4.X with Hive 12 support
+mvn -Pyarn -Phive-0.12.0 -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -DskipTests clean package
{% endhighlight %}
# Spark Tests in Maven
@@ -111,8 +119,8 @@ Tests are run by default via the [ScalaTest Maven plugin](http://www.scalatest.o
Some of the tests require Spark to be packaged first, so always run `mvn package` with `-DskipTests` the first time. The following is an example of a correct (build, test) sequence:
- mvn -Pyarn -Phadoop-2.3 -DskipTests -Phive clean package
- mvn -Pyarn -Phadoop-2.3 -Phive test
+ mvn -Pyarn -Phadoop-2.3 -DskipTests -Phive -Phive-0.12.0 clean package
+ mvn -Pyarn -Phadoop-2.3 -Phive -Phive-0.12.0 test
The ScalaTest plugin also supports running only a specific test suite as follows:
@@ -175,16 +183,16 @@ can be set to control the SBT build. For example:
Some of the tests require Spark to be packaged first, so always run `sbt/sbt assembly` the first time. The following is an example of a correct (build, test) sequence:
- sbt/sbt -Pyarn -Phadoop-2.3 -Phive assembly
- sbt/sbt -Pyarn -Phadoop-2.3 -Phive test
+ sbt/sbt -Pyarn -Phadoop-2.3 -Phive -Phive-0.12.0 assembly
+ sbt/sbt -Pyarn -Phadoop-2.3 -Phive -Phive-0.12.0 test
To run only a specific test suite as follows:
- sbt/sbt -Pyarn -Phadoop-2.3 -Phive "test-only org.apache.spark.repl.ReplSuite"
+ sbt/sbt -Pyarn -Phadoop-2.3 -Phive -Phive-0.12.0 "test-only org.apache.spark.repl.ReplSuite"
To run test suites of a specific sub project as follows:
- sbt/sbt -Pyarn -Phadoop-2.3 -Phive core/test
+ sbt/sbt -Pyarn -Phadoop-2.3 -Phive -Phive-0.12.0 core/test
# Speeding up Compilation with Zinc
@@ -192,4 +200,4 @@ To run test suites of a specific sub project as follows:
compiler. When run locally as a background process, it speeds up builds of Scala-based projects
like Spark. Developers who regularly recompile Spark with Maven will be the most interested in
Zinc. The project site gives instructions for building and running `zinc`; OS X users can
-install it using `brew install zinc`. \ No newline at end of file
+install it using `brew install zinc`.