aboutsummaryrefslogtreecommitdiff
path: root/docs/running-on-yarn.md
diff options
context:
space:
mode:
authorjerryshao <sshao@hortonworks.com>2017-01-11 09:24:02 -0600
committerTom Graves <tgraves@yahoo-inc.com>2017-01-11 09:24:02 -0600
commit4239a1081ad96a503fbf9277e42b97422bb8af3e (patch)
tree7096161bfbf11404ae2cfc13189214a92a5fa833 /docs/running-on-yarn.md
parenta6155135690433988aa0cbf22f260f52a235e9f5 (diff)
downloadspark-4239a1081ad96a503fbf9277e42b97422bb8af3e.tar.gz
spark-4239a1081ad96a503fbf9277e42b97422bb8af3e.tar.bz2
spark-4239a1081ad96a503fbf9277e42b97422bb8af3e.zip
[SPARK-19021][YARN] Generailize HDFSCredentialProvider to support non HDFS security filesystems
Currently Spark can only get token renewal interval from security HDFS (hdfs://), if Spark runs with other security file systems like webHDFS (webhdfs://), wasb (wasb://), ADLS, it will ignore these tokens and not get token renewal intervals from these tokens. These will make Spark unable to work with these security clusters. So instead of only checking HDFS token, we should generalize to support different DelegationTokenIdentifier. ## How was this patch tested? Manually verified in security cluster. Author: jerryshao <sshao@hortonworks.com> Closes #16432 from jerryshao/SPARK-19021.
Diffstat (limited to 'docs/running-on-yarn.md')
-rw-r--r--docs/running-on-yarn.md12
1 files changed, 6 insertions, 6 deletions
diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md
index a0729757b7..f7513454c7 100644
--- a/docs/running-on-yarn.md
+++ b/docs/running-on-yarn.md
@@ -479,12 +479,12 @@ Hadoop services issue *hadoop tokens* to grant access to the services and data.
Clients must first acquire tokens for the services they will access and pass them along with their
application as it is launched in the YARN cluster.
-For a Spark application to interact with HDFS, HBase and Hive, it must acquire the relevant tokens
+For a Spark application to interact with any of the Hadoop filesystem (for example hdfs, webhdfs, etc), HBase and Hive, it must acquire the relevant tokens
using the Kerberos credentials of the user launching the application
—that is, the principal whose identity will become that of the launched Spark application.
This is normally done at launch time: in a secure cluster Spark will automatically obtain a
-token for the cluster's HDFS filesystem, and potentially for HBase and Hive.
+token for the cluster's default Hadoop filesystem, and potentially for HBase and Hive.
An HBase token will be obtained if HBase is in on classpath, the HBase configuration declares
the application is secure (i.e. `hbase-site.xml` sets `hbase.security.authentication` to `kerberos`),
@@ -494,12 +494,12 @@ Similarly, a Hive token will be obtained if Hive is on the classpath, its config
includes a URI of the metadata store in `"hive.metastore.uris`, and
`spark.yarn.security.credentials.hive.enabled` is not set to `false`.
-If an application needs to interact with other secure HDFS clusters, then
+If an application needs to interact with other secure Hadoop filesystems, then
the tokens needed to access these clusters must be explicitly requested at
launch time. This is done by listing them in the `spark.yarn.access.namenodes` property.
```
-spark.yarn.access.namenodes hdfs://ireland.example.org:8020/,hdfs://frankfurt.example.org:8020/
+spark.yarn.access.namenodes hdfs://ireland.example.org:8020/,webhdfs://frankfurt.example.org:50070/
```
Spark supports integrating with other security-aware services through Java Services mechanism (see
@@ -558,8 +558,8 @@ For Spark applications, the Oozie workflow must be set up for Oozie to request a
the application needs, including:
- The YARN resource manager.
-- The local HDFS filesystem.
-- Any remote HDFS filesystems used as a source or destination of I/O.
+- The local Hadoop filesystem.
+- Any remote Hadoop filesystems used as a source or destination of I/O.
- Hive —if used.
- HBase —if used.
- The YARN timeline server, if the application interacts with this.