From 49e98500a9b1f93ab3224c4358dbc56f1e37ff35 Mon Sep 17 00:00:00 2001 From: Andy Konwinski Date: Wed, 5 Sep 2012 13:24:09 -0700 Subject: Updated base README to point to documentation site instead of wiki, updated docs/README.md to describe use of Jekyll, and renmaed things to make them more consistent with the lower-case-with-hyphens convention. --- docs/ec2-scripts.md | 146 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 146 insertions(+) create mode 100644 docs/ec2-scripts.md (limited to 'docs/ec2-scripts.md') diff --git a/docs/ec2-scripts.md b/docs/ec2-scripts.md new file mode 100644 index 0000000000..35d28c47d0 --- /dev/null +++ b/docs/ec2-scripts.md @@ -0,0 +1,146 @@ +--- +layout: global +title: Using the Spark EC2 Scripts +--- +The `spark-ec2` script located in the Spark's `ec2` directory allows you +to launch, manage and shut down Spark clusters on Amazon EC2. It builds +on the [Mesos EC2 script](https://github.com/mesos/mesos/wiki/EC2-Scripts) +in Apache Mesos. + +`spark-ec2` is designed to manage multiple named clusters. You can +launch a new cluster (telling the script its size and giving it a name), +shutdown an existing cluster, or log into a cluster. Each cluster is +identified by placing its machines into EC2 security groups whose names +are derived from the name of the cluster. For example, a cluster named +`test` will contain a master node in a security group called +`test-master`, and a number of slave nodes in a security group called +`test-slaves`. The `spark-ec2` script will create these security groups +for you based on the cluster name you request. You can also use them to +identify machines belonging to each cluster in the EC2 Console or +ElasticFox. + +This guide describes how to get set up to run clusters, how to launch +clusters, how to run jobs on them, and how to shut them down. + +Before You Start +================ + +- Create an Amazon EC2 key pair for yourself. This can be done by + logging into your Amazon Web Services account through the [AWS + console](http://aws.amazon.com/console/), clicking Key Pairs on the + left sidebar, and creating and downloading a key. Make sure that you + set the permissions for the private key file to `600` (i.e. only you + can read and write it) so that `ssh` will work. +- Whenever you want to use the `spark-ec2` script, set the environment + variables `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` to your + Amazon EC2 access key ID and secret access key. These can be + obtained from the [AWS homepage](http://aws.amazon.com/) by clicking + Account \> Security Credentials \> Access Credentials. + +Launching a Cluster +=================== + +- Go into the `ec2` directory in the release of Spark you downloaded. +- Run + `./spark-ec2 -k -i -s launch `, + where `` is the name of your EC2 key pair (that you gave it + when you created it), `` is the private key file for your + key pair, `` is the number of slave nodes to launch (try + 1 at first), and `` is the name to give to your + cluster. +- After everything launches, check that Mesos is up and sees all the + slaves by going to the Mesos Web UI link printed at the end of the + script (`http://:8080`). + +You can also run `./spark-ec2 --help` to see more usage options. The +following options are worth pointing out: + +- `--instance-type=` can be used to specify an EC2 +instance type to use. For now, the script only supports 64-bit instance +types, and the default type is `m1.large` (which has 2 cores and 7.5 GB +RAM). Refer to the Amazon pages about [EC2 instance +types](http://aws.amazon.com/ec2/instance-types) and [EC2 +pricing](http://aws.amazon.com/ec2/#pricing) for information about other +instance types. +- `--zone=` can be used to specify an EC2 availability zone +to launch instances in. Sometimes, you will get an error because there +is not enough capacity in one zone, and you should try to launch in +another. This happens mostly with the `m1.large` instance types; +extra-large (both `m1.xlarge` and `c1.xlarge`) instances tend to be more +available. +- `--ebs-vol-size=GB` will attach an EBS volume with a given amount + of space to each node so that you can have a persistent HDFS cluster + on your nodes across cluster restarts (see below). +- If one of your launches fails due to e.g. not having the right +permissions on your private key file, you can run `launch` with the +`--resume` option to restart the setup process on an existing cluster. + +Running Jobs +============ + +- Go into the `ec2` directory in the release of Spark you downloaded. +- Run `./spark-ec2 -k -i login ` to + SSH into the cluster, where `` and `` are as + above. (This is just for convenience; you could also use + the EC2 console.) +- To deploy code or data within your cluster, you can log in and use the + provided script `~/mesos-ec2/copy-dir`, which, + given a directory path, RSYNCs it to the same location on all the slaves. +- If your job needs to access large datasets, the fastest way to do + that is to load them from Amazon S3 or an Amazon EBS device into an + instance of the Hadoop Distributed File System (HDFS) on your nodes. + The `spark-ec2` script already sets up a HDFS instance for you. It's + installed in `/root/ephemeral-hdfs`, and can be accessed using the + `bin/hadoop` script in that directory. Note that the data in this + HDFS goes away when you stop and restart a machine. +- There is also a *persistent HDFS* instance in + `/root/presistent-hdfs` that will keep data across cluster restarts. + Typically each node has relatively little space of persistent data + (about 3 GB), but you can use the `--ebs-vol-size` option to + `spark-ec2` to attach a persistent EBS volume to each node for + storing the persistent HDFS. +- Finally, if you get errors while running your jobs, look at the slave's logs + for that job using the Mesos web UI (`http://:8080`). + +Terminating a Cluster +===================== + +***Note that there is no way to recover data on EC2 nodes after shutting +them down! Make sure you have copied everything important off the nodes +before stopping them.*** + +- Go into the `ec2` directory in the release of Spark you downloaded. +- Run `./spark-ec2 destroy `. + +Pausing and Restarting Clusters +=============================== + +The `spark-ec2` script also supports pausing a cluster. In this case, +the VMs are stopped but not terminated, so they +***lose all data on ephemeral disks*** but keep the data in their +root partitions and their `persistent-hdfs`. Stopped machines will not +cost you any EC2 cycles, but ***will*** continue to cost money for EBS +storage. + +- To stop one of your clusters, go into the `ec2` directory and run +`./spark-ec2 stop `. +- To restart it later, run +`./spark-ec2 -i start `. +- To ultimately destroy the cluster and stop consuming EBS space, run +`./spark-ec2 destroy ` as described in the previous +section. + +Limitations +=========== + +- `spark-ec2` currently only launches machines in the US-East region of EC2. + It should not be hard to make it launch VMs in other zones, but you will need + to create your own AMIs in them. +- Support for "cluster compute" nodes is limited -- there's no way to specify a + locality group. However, you can launch slave nodes in your `-slaves` + group manually and then use `spark-ec2 launch --resume` to start a cluster with + them. +- Support for spot instances is limited. + +If you have a patch or suggestion for one of these limitations, feel free to +[[contribute|Contributing to Spark]] it! -- cgit v1.2.3