aboutsummaryrefslogblamecommitdiff
path: root/docs/hadoop-provided.md
blob: 0ba5a58051abcbfa965bd5d9a47b6577c67fca8a (plain) (tree)

























                                                                                                                                                                                                                                                                                                                                                                                                 
---
layout: global
displayTitle: Using Spark's "Hadoop Free" Build
title: Using Spark's "Hadoop Free" Build
---

Spark uses Hadoop client libraries for HDFS and YARN. Starting in version Spark 1.4, the project packages "Hadoop free" builds that lets you more easily connect a single Spark binary to any Hadoop version. To use these builds, you need to modify `SPARK_DIST_CLASSPATH` to include Hadoop's package jars. The most convenient place to do this is by adding an entry in `conf/spark-env.sh`.

This page describes how to connect Spark to Hadoop for different types of distributions.

# Apache Hadoop
For Apache distributions, you can use Hadoop's 'classpath' command. For instance:

{% highlight bash %}
### in conf/spark-env.sh ###

# If 'hadoop' binary is on your PATH
export SPARK_DIST_CLASSPATH=$(hadoop classpath)

# With explicit path to 'hadoop' binary
export SPARK_DIST_CLASSPATH=$(/path/to/hadoop/bin/hadoop classpath)

# Passing a Hadoop configuration directory
export SPARK_DIST_CLASSPATH=$(hadoop classpath --config /path/to/configs)

{% endhighlight %}