aboutsummaryrefslogtreecommitdiff
path: root/docs/running-on-yarn.md
diff options
context:
space:
mode:
authorjerryshao <sshao@hortonworks.com>2016-08-10 15:39:30 -0700
committerMarcelo Vanzin <vanzin@cloudera.com>2016-08-10 15:39:30 -0700
commitab648c0004cfb20d53554ab333dd2d198cb94ffa (patch)
tree74fa18e0a21caedaca6eda3557d60c9bd3af07b0 /docs/running-on-yarn.md
parentbd2c12fb4994785d5becce541aee9ba73fef1c4c (diff)
downloadspark-ab648c0004cfb20d53554ab333dd2d198cb94ffa.tar.gz
spark-ab648c0004cfb20d53554ab333dd2d198cb94ffa.tar.bz2
spark-ab648c0004cfb20d53554ab333dd2d198cb94ffa.zip
[SPARK-14743][YARN] Add a configurable credential manager for Spark running on YARN
## What changes were proposed in this pull request? Add a configurable token manager for Spark on running on yarn. ### Current Problems ### 1. Supported token provider is hard-coded, currently only hdfs, hbase and hive are supported and it is impossible for user to add new token provider without code changes. 2. Also this problem exits in timely token renewer and updater. ### Changes In This Proposal ### In this proposal, to address the problems mentioned above and make the current code more cleaner and easier to understand, mainly has 3 changes: 1. Abstract a `ServiceTokenProvider` as well as `ServiceTokenRenewable` interface for token provider. Each service wants to communicate with Spark through token way needs to implement this interface. 2. Provide a `ConfigurableTokenManager` to manage all the register token providers, also token renewer and updater. Also this class offers the API for other modules to obtain tokens, get renewal interval and so on. 3. Implement 3 built-in token providers `HDFSTokenProvider`, `HiveTokenProvider` and `HBaseTokenProvider` to keep the same semantics as supported today. Whether to load in these built-in token providers is controlled by configuration "spark.yarn.security.tokens.${service}.enabled", by default for all the built-in token providers are loaded. ### Behavior Changes ### For the end user there's no behavior change, we still use the same configuration `spark.yarn.security.tokens.${service}.enabled` to decide which token provider is enabled (hbase or hive). For user implemented token provider (assume the name of token provider is "test") needs to add into this class should have two configurations: 1. `spark.yarn.security.tokens.test.enabled` to true 2. `spark.yarn.security.tokens.test.class` to the full qualified class name. So we still keep the same semantics as current code while add one new configuration. ### Current Status ### - [x] token provider interface and management framework. - [x] implement built-in token providers (hdfs, hbase, hive). - [x] Coverage of unit test. - [x] Integrated test with security cluster. ## How was this patch tested? Unit test and integrated test. Please suggest and review, any comment is greatly appreciated. Author: jerryshao <sshao@hortonworks.com> Closes #14065 from jerryshao/SPARK-16342.
Diffstat (limited to 'docs/running-on-yarn.md')
-rw-r--r--docs/running-on-yarn.md22
1 files changed, 14 insertions, 8 deletions
diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md
index befd3eaee9..cd18808681 100644
--- a/docs/running-on-yarn.md
+++ b/docs/running-on-yarn.md
@@ -461,15 +461,14 @@ To use a custom metrics.properties for the application master and executors, upd
</td>
</tr>
<tr>
- <td><code>spark.yarn.security.tokens.${service}.enabled</code></td>
+ <td><code>spark.yarn.security.credentials.${service}.enabled</code></td>
<td><code>true</code></td>
<td>
- Controls whether to retrieve delegation tokens for non-HDFS services when security is enabled.
- By default, delegation tokens for all supported services are retrieved when those services are
+ Controls whether to obtain credentials for services when security is enabled.
+ By default, credentials for all supported services are retrieved when those services are
configured, but it's possible to disable that behavior if it somehow conflicts with the
- application being run.
- <p/>
- Currently supported services are: <code>hive</code>, <code>hbase</code>
+ application being run. For further details please see
+ [Running in a Secure Cluster](running-on-yarn.html#running-in-a-secure-cluster)
</td>
</tr>
<tr>
@@ -525,11 +524,11 @@ token for the cluster's HDFS filesystem, and potentially for HBase and Hive.
An HBase token will be obtained if HBase is in on classpath, the HBase configuration declares
the application is secure (i.e. `hbase-site.xml` sets `hbase.security.authentication` to `kerberos`),
-and `spark.yarn.security.tokens.hbase.enabled` is not set to `false`.
+and `spark.yarn.security.credentials.hbase.enabled` is not set to `false`.
Similarly, a Hive token will be obtained if Hive is on the classpath, its configuration
includes a URI of the metadata store in `"hive.metastore.uris`, and
-`spark.yarn.security.tokens.hive.enabled` is not set to `false`.
+`spark.yarn.security.credentials.hive.enabled` is not set to `false`.
If an application needs to interact with other secure HDFS clusters, then
the tokens needed to access these clusters must be explicitly requested at
@@ -539,6 +538,13 @@ launch time. This is done by listing them in the `spark.yarn.access.namenodes` p
spark.yarn.access.namenodes hdfs://ireland.example.org:8020/,hdfs://frankfurt.example.org:8020/
```
+Spark supports integrating with other security-aware services through Java Services mechanism (see
+`java.util.ServiceLoader`). To do that, implementations of `org.apache.spark.deploy.yarn.security.ServiceCredentialProvider`
+should be available to Spark by listing their names in the corresponding file in the jar's
+`META-INF/services` directory. These plug-ins can be disabled by setting
+`spark.yarn.security.tokens.{service}.enabled` to `false`, where `{service}` is the name of
+credential provider.
+
## Configuring the External Shuffle Service
To start the Spark Shuffle Service on each `NodeManager` in your YARN cluster, follow these