aboutsummaryrefslogtreecommitdiff
path: root/docs/ec2-scripts.md
diff options
context:
space:
mode:
authorDan Osipov <daniil.osipov@shazam.com>2014-09-16 13:40:16 -0700
committerPatrick Wendell <pwendell@gmail.com>2014-09-16 13:40:16 -0700
commitb20171267d610715d5b0a86b474c903e9bc3a1a3 (patch)
treecc5b31cb62a4764412a4aa8569fa05a9875b49f3 /docs/ec2-scripts.md
parentec1adecbb72d291d7ef122fb0505bae53116e0e6 (diff)
downloadspark-b20171267d610715d5b0a86b474c903e9bc3a1a3.tar.gz
spark-b20171267d610715d5b0a86b474c903e9bc3a1a3.tar.bz2
spark-b20171267d610715d5b0a86b474c903e9bc3a1a3.zip
[SPARK-787] Add S3 configuration parameters to the EC2 deploy scripts
When deploying to AWS, there is additional configuration that is required to read S3 files. EMR creates it automatically, there is no reason that the Spark EC2 script shouldn't. This PR requires a corresponding PR to the mesos/spark-ec2 to be merged, as it gets cloned in the process of setting up machines: https://github.com/mesos/spark-ec2/pull/58 Author: Dan Osipov <daniil.osipov@shazam.com> Closes #1120 from danosipov/s3_credentials and squashes the following commits: 758da8b [Dan Osipov] Modify documentation to include the new parameter 71fab14 [Dan Osipov] Use a parameter --copy-aws-credentials to enable S3 credential deployment 7e0da26 [Dan Osipov] Get AWS credentials out of boto connection instance 39bdf30 [Dan Osipov] Add S3 configuration parameters to the EC2 deploy scripts
Diffstat (limited to 'docs/ec2-scripts.md')
-rw-r--r--docs/ec2-scripts.md2
1 files changed, 1 insertions, 1 deletions
diff --git a/docs/ec2-scripts.md b/docs/ec2-scripts.md
index f5ac6d894e..b2ca6a9b48 100644
--- a/docs/ec2-scripts.md
+++ b/docs/ec2-scripts.md
@@ -156,6 +156,6 @@ If you have a patch or suggestion for one of these limitations, feel free to
# Accessing Data in S3
-Spark's file interface allows it to process data in Amazon S3 using the same URI formats that are supported for Hadoop. You can specify a path in S3 as input through a URI of the form `s3n://<bucket>/path`. You will also need to set your Amazon security credentials, either by setting the environment variables `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` before your program or through `SparkContext.hadoopConfiguration`. Full instructions on S3 access using the Hadoop input libraries can be found on the [Hadoop S3 page](http://wiki.apache.org/hadoop/AmazonS3).
+Spark's file interface allows it to process data in Amazon S3 using the same URI formats that are supported for Hadoop. You can specify a path in S3 as input through a URI of the form `s3n://<bucket>/path`. To provide AWS credentials for S3 access, launch the Spark cluster with the option `--copy-aws-credentials`. Full instructions on S3 access using the Hadoop input libraries can be found on the [Hadoop S3 page](http://wiki.apache.org/hadoop/AmazonS3).
In addition to using a single input file, you can also use a directory of files as input by simply giving the path to the directory.