aboutsummaryrefslogtreecommitdiff
path: root/dev
diff options
context:
space:
mode:
authorMark Grover <mark@apache.org>2016-11-28 08:59:47 -0800
committerMarcelo Vanzin <vanzin@cloudera.com>2016-11-28 08:59:47 -0800
commit237c3b9642a1a7c5e7884824b21877590d5d0b3b (patch)
treefeee7cb02e41a0e904940b653ff7c6d083f02bb3 /dev
parentd31ff9b7caf4eba66724947b68f517072e6a011c (diff)
downloadspark-237c3b9642a1a7c5e7884824b21877590d5d0b3b.tar.gz
spark-237c3b9642a1a7c5e7884824b21877590d5d0b3b.tar.bz2
spark-237c3b9642a1a7c5e7884824b21877590d5d0b3b.zip
[SPARK-18535][UI][YARN] Redact sensitive information from Spark logs and UI
## What changes were proposed in this pull request? This patch adds a new property called `spark.secret.redactionPattern` that allows users to specify a scala regex to decide which Spark configuration properties and environment variables in driver and executor environments contain sensitive information. When this regex matches the property or environment variable name, its value is redacted from the environment UI and various logs like YARN and event logs. This change uses this property to redact information from event logs and YARN logs. It also, updates the UI code to adhere to this property instead of hardcoding the logic to decipher which properties are sensitive. Here's an image of the UI post-redaction: ![image](https://cloud.githubusercontent.com/assets/1709451/20506215/4cc30654-b007-11e6-8aee-4cde253fba2f.png) Here's the text in the YARN logs, post-redaction: ``HADOOP_CREDSTORE_PASSWORD -> *********(redacted)`` Here's the text in the event logs, post-redaction: ``...,"spark.executorEnv.HADOOP_CREDSTORE_PASSWORD":"*********(redacted)","spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD":"*********(redacted)",...`` ## How was this patch tested? 1. Unit tests are added to ensure that redaction works. 2. A YARN job reading data off of S3 with confidential information (hadoop credential provider password) being provided in the environment variables of driver and executor. And, afterwards, logs were grepped to make sure that no mention of secret password was present. It was also ensure that the job was able to read the data off of S3 correctly, thereby ensuring that the sensitive information was being trickled down to the right places to read the data. 3. The event logs were checked to make sure no mention of secret password was present. 4. UI environment tab was checked to make sure there was no secret information being displayed. Author: Mark Grover <mark@apache.org> Closes #15971 from markgrover/master_redaction.
Diffstat (limited to 'dev')
0 files changed, 0 insertions, 0 deletions