diff options
author | Yin Huai <yhuai@databricks.com> | 2015-04-25 13:43:39 -0700 |
---|---|---|
committer | Reynold Xin <rxin@databricks.com> | 2015-04-25 13:43:39 -0700 |
commit | aa6966ff34dacc83c3ca675b5109b05e35015469 (patch) | |
tree | 573bd29c0a0c1c50150e9de8c720bf7c5560f38d /sql/README.md | |
parent | a7160c4e3aae22600d05e257d0b4d2428754b8ea (diff) | |
download | spark-aa6966ff34dacc83c3ca675b5109b05e35015469.tar.gz spark-aa6966ff34dacc83c3ca675b5109b05e35015469.tar.bz2 spark-aa6966ff34dacc83c3ca675b5109b05e35015469.zip |
[SQL] Update SQL readme to include instructions on generating golden answer files based on Hive 0.13.1.
Author: Yin Huai <yhuai@databricks.com>
Closes #5702 from yhuai/howToGenerateGoldenFiles and squashes the following commits:
9c4a7f8 [Yin Huai] Update readme to include instructions on generating golden answer files based on Hive 0.13.1.
Diffstat (limited to 'sql/README.md')
-rw-r--r-- | sql/README.md | 23 |
1 files changed, 22 insertions, 1 deletions
diff --git a/sql/README.md b/sql/README.md index 237620e3fa..46aec7cef7 100644 --- a/sql/README.md +++ b/sql/README.md @@ -12,7 +12,10 @@ Spark SQL is broken up into four subprojects: Other dependencies for developers --------------------------------- -In order to create new hive test cases , you will need to set several environmental variables. +In order to create new hive test cases (i.e. a test suite based on `HiveComparisonTest`), +you will need to setup your development environment based on the following instructions. + +If you are working with Hive 0.12.0, you will need to set several environmental variables as follows. ``` export HIVE_HOME="<path to>/hive/build/dist" @@ -20,6 +23,24 @@ export HIVE_DEV_HOME="<path to>/hive/" export HADOOP_HOME="<path to>/hadoop-1.0.4" ``` +If you are working with Hive 0.13.1, the following steps are needed: + +1. Download Hive's [0.13.1](https://hive.apache.org/downloads.html) and set `HIVE_HOME` with `export HIVE_HOME="<path to hive>"`. Please do not set `HIVE_DEV_HOME` (See [SPARK-4119](https://issues.apache.org/jira/browse/SPARK-4119)). +2. Set `HADOOP_HOME` with `export HADOOP_HOME="<path to hadoop>"` +3. Download all Hive 0.13.1a jars (Hive jars actually used by Spark) from [here](http://mvnrepository.com/artifact/org.spark-project.hive) and replace corresponding original 0.13.1 jars in `$HIVE_HOME/lib`. +4. Download [Kryo 2.21 jar](http://mvnrepository.com/artifact/com.esotericsoftware.kryo/kryo/2.21) (Note: 2.22 jar does not work) and [Javolution 5.5.1 jar](http://mvnrepository.com/artifact/javolution/javolution/5.5.1) to `$HIVE_HOME/lib`. +5. This step is optional. But, when generating golden answer files, if a Hive query fails and you find that Hive tries to talk to HDFS or you find weird runtime NPEs, set the following in your test suite... + +``` +val testTempDir = Utils.createTempDir() +// We have to use kryo to let Hive correctly serialize some plans. +sql("set hive.plan.serialization.format=kryo") +// Explicitly set fs to local fs. +sql(s"set fs.default.name=file://$testTempDir/") +// Ask Hive to run jobs in-process as a single map and reduce task. +sql("set mapred.job.tracker=local") +``` + Using the console ================= An interactive scala console can be invoked by running `build/sbt hive/console`. |