diff options
Diffstat (limited to 'docs/configuration.md')
-rw-r--r-- | docs/configuration.md | 15 |
1 files changed, 15 insertions, 0 deletions
diff --git a/docs/configuration.md b/docs/configuration.md index 682384d424..c276e8e90d 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -1674,3 +1674,18 @@ Spark uses [log4j](http://logging.apache.org/log4j/) for logging. You can config To specify a different configuration directory other than the default "SPARK_HOME/conf", you can set SPARK_CONF_DIR. Spark will use the the configuration files (spark-defaults.conf, spark-env.sh, log4j.properties, etc) from this directory. + +# Inheriting Hadoop Cluster Configuration + +If you plan to read and write from HDFS using Spark, there are two Hadoop configuration files that +should be included on Spark's classpath: + +* `hdfs-site.xml`, which provides default behaviors for the HDFS client. +* `core-site.xml`, which sets the default filesystem name. + +The location of these configuration files varies across CDH and HDP versions, but +a common location is inside of `/etc/hadoop/conf`. Some tools, such as Cloudera Manager, create +configurations on-the-fly, but offer a mechanisms to download copies of them. + +To make these files visible to Spark, set `HADOOP_CONF_DIR` in `$SPARK_HOME/spark-env.sh` +to a location containing the configuration files. |