aboutsummaryrefslogtreecommitdiff
path: root/ec2
diff options
context:
space:
mode:
authorNicholas Chammas <nicholas.chammas@gmail.com>2015-02-09 09:44:53 +0000
committerSean Owen <sowen@cloudera.com>2015-02-09 09:44:53 +0000
commit4dfe180fc893bee1146161f8b2a6efd4d6d2bb8c (patch)
treedaa67660e48ef4fdd4a6742657ff2c374759934a /ec2
parent855d12ac0a9cdade4cd2cc64c4e7209478be6690 (diff)
downloadspark-4dfe180fc893bee1146161f8b2a6efd4d6d2bb8c.tar.gz
spark-4dfe180fc893bee1146161f8b2a6efd4d6d2bb8c.tar.bz2
spark-4dfe180fc893bee1146161f8b2a6efd4d6d2bb8c.zip
[SPARK-5473] [EC2] Expose SSH failures after status checks pass
If there is some fatal problem with launching a cluster, `spark-ec2` just hangs without giving the user useful feedback on what the problem is. This PR exposes the output of the SSH calls to the user if the SSH test fails during cluster launch for any reason but the instance status checks are all green. It also removes the growing trail of dots while waiting in favor of a fixed 3 dots. For example: ``` $ ./ec2/spark-ec2 -k key -i /incorrect/path/identity.pem --instance-type m3.medium --slaves 1 --zone us-east-1c launch "spark-test" Setting up security groups... Searching for existing cluster spark-test... Spark AMI: ami-35b1885c Launching instances... Launched 1 slaves in us-east-1c, regid = r-7dadd096 Launched master in us-east-1c, regid = r-fcadd017 Waiting for cluster to enter 'ssh-ready' state... Warning: SSH connection error. (This could be temporary.) Host: 127.0.0.1 SSH return code: 255 SSH output: Warning: Identity file /incorrect/path/identity.pem not accessible: No such file or directory. Warning: Permanently added '127.0.0.1' (RSA) to the list of known hosts. Permission denied (publickey). ``` This should give users enough information when some unrecoverable error occurs during launch so they can know to abort the launch. This will help avoid situations like the ones reported [here on Stack Overflow](http://stackoverflow.com/q/28002443/) and [here on the user list](http://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3C1422323829398-21381.postn3.nabble.com%3E), where the users couldn't tell what the problem was because it was being hidden by `spark-ec2`. This is a usability improvement that should be backported to 1.2. Resolves [SPARK-5473](https://issues.apache.org/jira/browse/SPARK-5473). Author: Nicholas Chammas <nicholas.chammas@gmail.com> Closes #4262 from nchammas/expose-ssh-failure and squashes the following commits: 8bda6ed [Nicholas Chammas] default to print SSH output 2b92534 [Nicholas Chammas] show SSH output after status check pass
Diffstat (limited to 'ec2')
-rwxr-xr-xec2/spark_ec2.py36
1 files changed, 24 insertions, 12 deletions
diff --git a/ec2/spark_ec2.py b/ec2/spark_ec2.py
index 725b1e47e0..87b2112fe4 100755
--- a/ec2/spark_ec2.py
+++ b/ec2/spark_ec2.py
@@ -34,6 +34,7 @@ import subprocess
import sys
import tarfile
import tempfile
+import textwrap
import time
import urllib2
import warnings
@@ -681,21 +682,32 @@ def setup_spark_cluster(master, opts):
print "Ganglia started at http://%s:5080/ganglia" % master
-def is_ssh_available(host, opts):
+def is_ssh_available(host, opts, print_ssh_output=True):
"""
Check if SSH is available on a host.
"""
- try:
- with open(os.devnull, 'w') as devnull:
- ret = subprocess.check_call(
- ssh_command(opts) + ['-t', '-t', '-o', 'ConnectTimeout=3',
- '%s@%s' % (opts.user, host), stringify_command('true')],
- stdout=devnull,
- stderr=devnull
- )
- return ret == 0
- except subprocess.CalledProcessError as e:
- return False
+ s = subprocess.Popen(
+ ssh_command(opts) + ['-t', '-t', '-o', 'ConnectTimeout=3',
+ '%s@%s' % (opts.user, host), stringify_command('true')],
+ stdout=subprocess.PIPE,
+ stderr=subprocess.STDOUT # we pipe stderr through stdout to preserve output order
+ )
+ cmd_output = s.communicate()[0] # [1] is stderr, which we redirected to stdout
+
+ if s.returncode != 0 and print_ssh_output:
+ # extra leading newline is for spacing in wait_for_cluster_state()
+ print textwrap.dedent("""\n
+ Warning: SSH connection error. (This could be temporary.)
+ Host: {h}
+ SSH return code: {r}
+ SSH output: {o}
+ """).format(
+ h=host,
+ r=s.returncode,
+ o=cmd_output.strip()
+ )
+
+ return s.returncode == 0
def is_cluster_ssh_available(cluster_instances, opts):