Taming the EC2 API

I've been spending some time lately familiarizing myself with EC2, setting up some MySQL servers & clusters here and there, and doing some really basic configuration testing. One situation you'll run into when interacting with EC2 is that it gets unwieldy to use the AWS Management Console web interface for interacting with your instances. There ends up being lots of scrolling, lots of staring, and lots of sighs. Since I'm using SSH to connect to and interact with my instances, I want a reasonable way to find information about them on the Unix command line.

Amazon has an official set of tools [http://aws.amazon.com/developertools/351] that give you this information , at least theoretically. It is some gigantic distribution of shell scripts and Java madness that, if you are very patient, will eventually give you some information about your instances, in a format that is very difficult to work with.

$ time ./bin/ec2-describe-instances i-83c5d4e0
Unable to find a $JAVA_HOME at "/usr", continuing with system-provided Java...
RESERVATION     r-7db4731c      801025846226
INSTANCE        i-83c5d4e0      ami-31814f58                    stopped skysql-ec2      0               m1.small        2011-12-09T20:41:39+0000        us-east-1c    aki-805ea7e9                    monitoring-disabled             10.0.0.164      vpc-cd4fafa5    subnet-c44fafac ebs                                  paravirtual      xen             sg-134b547f     default
BLOCKDEVICE     /dev/sda1       vol-19ec6174    2011-12-10T01:30:32.000Z
TAG     instance        i-83c5d4e0      Name    ndb32-02

real    0m7.693s
user    0m10.119s
sys     0m0.451s

OK, it takes me about 7.5 seconds to get data about an instance, and it's given to me in 4 lines. If I get information about all of my data, I have no idea how I would be able to successfully grep through that to interact with any of it programatically. I went looking for a different solution, preferably one that would be faster, more flexible, and easier to use.

I found a great script called, simply, aws, written by Timothy Kay [http://timkay.com/aws/].

$ du -hsc ec2-api-tools*
 14M    ec2-api-tools-1.5.0.1-2011.11.30
 11M    ec2-api-tools.zip
 26M    total

$ ls -sk aws 
 76 aws

Ahem. I'll take a 76K perl script over a 14M mess any day. Let's see how it performs.

$ time aws din i-83c5d4e0
+------------+--------------+----------------------+------------------------------------------+------------+--------------+--------------------------+---------------------------------------------+--------------+----------------+-----------------+--------------+------------------+-----------------+---------------------------------------------+-------------------------------------------------------------------------------------------------+--------------+----------------+----------------+------------------------------------------------------------------------------------------------------------------------------------+--------------------+--------+------+----------+
| instanceId |   imageId    |    instanceState     |                  reason                  |  keyName   | instanceType |        launchTime        |                  placement                  |   kernelId   |   monitoring   |    subnetId     |    vpcId     | privateIpAddress | sourceDestCheck |                  groupSet                   |                                           stateReason                                           | architecture | rootDeviceType | rootDeviceName |                                                         blockDeviceMapping                                                         | virtualizationType | tagSet | key  |  value   |

| i-83c5d4e0 | ami-31814f58 | code=80 name=stopped | User initiated (2011-12-10 01:29:51 GMT) | skysql-ec2 | m1.small     | 2011-12-09T20:41:39.000Z | availabilityZone=us-east-1c tenancy=default | aki-805ea7e9 | state=disabled | subnet-c44fafac | vpc-cd4fafa5 | 10.0.0.164       | true            | item= groupId=sg-134b547f groupName=default | code=Client.UserInitiatedShutdown message=Client.UserInitiatedShutdown: User initiated shutdown | i386         | ebs            | /dev/sda1      | item= deviceName=/dev/sda1 ebs= volumeId=vol-19ec6174 status=attached attachTime=2011-12-10T01:30:32.000Z deleteOnTermination=true | paravirtual        |        |      |          |
|            |              |                      |                                          |            |              |                          |                                             |              |                |                 |              |                  |                 |                                             |                                                                                                 |              |                |                |                                                                                                                                    |                    |        | Name | ndb32-02 |


real    0m1.546s
user    0m0.123s
sys    0m0.035s

Well, the output format isn't exactly any more appealing than what you get from the Amazon tool, but it sure gives it to you a lot faster! A little poking around showed me that the aws tool allows you to forego the pretty-printing and get the actual XML that the tool receives from the AWS API.

$ aws --xml din i-83c5d4e0
<?xml version="1.0" encoding="UTF-8"?>
<DescribeInstancesResponse xmlns="http://ec2.amazonaws.com/doc/2011-11-01/">
    <requestId>4e1bf76d-ad02-439b-b255-108e09713251</requestId>
    <reservationSet>
        <item>
            <reservationId>r-7db4731c</reservationId>
            <ownerId>801025846226</ownerId>
            <groupSet/>
            <instancesSet>
                <item>
                    <instanceId>i-83c5d4e0</instanceId>
                    <imageId>ami-31814f58</imageId>
                    <instanceState>
                        <code>80</code>
                        <name>stopped</name>
                    </instanceState>
                    <privateDnsName/>
                    <dnsName/>
                    <reason>User initiated (2011-12-10 01:29:51 GMT)</reason>
                    <keyName>skysql-ec2</keyName>
                    <amiLaunchIndex>0</amiLaunchIndex>
                    <productCodes/>
                    <instanceType>m1.small</instanceType>
                    <launchTime>2011-12-09T20:41:39.000Z</launchTime>
                    <placement>
                        <availabilityZone>us-east-1c</availabilityZone>
                        <groupName/>
                        <tenancy>default</tenancy>
                    </placement>
                    <kernelId>aki-805ea7e9</kernelId>
                    <monitoring>
                        <state>disabled</state>
                    </monitoring>
                    <subnetId>subnet-c44fafac</subnetId>
                    <vpcId>vpc-cd4fafa5</vpcId>
                    <privateIpAddress>10.0.0.164</privateIpAddress>
                    <sourceDestCheck>true</sourceDestCheck>
                    <groupSet>
                        <item>
                            <groupId>sg-134b547f</groupId>
                            <groupName>default</groupName>
                        </item>
                    </groupSet>
                    <stateReason>
                        <code>Client.UserInitiatedShutdown</code>
                        <message>Client.UserInitiatedShutdown: User initiated shutdown</message>
                    </stateReason>
                    <architecture>i386</architecture>
                    <rootDeviceType>ebs</rootDeviceType>
                    <rootDeviceName>/dev/sda1</rootDeviceName>
                    <blockDeviceMapping>
                        <item>
                            <deviceName>/dev/sda1</deviceName>
                            <ebs>
                                <volumeId>vol-19ec6174</volumeId>
                                <status>attached</status>
                                <attachTime>2011-12-10T01:30:32.000Z</attachTime>
                                <deleteOnTermination>true</deleteOnTermination>
                            </ebs>
                        </item>
                    </blockDeviceMapping>
                    <virtualizationType>paravirtual</virtualizationType>
                    <clientToken/>
                    <tagSet>
                        <item>
                            <key>Name</key>
                            <value>ndb32-02</value>
                        </item>
                    </tagSet>
                    <hypervisor>xen</hypervisor>
                </item>
            </instancesSet>
            <requesterId>058890971305</requesterId>
        </item>
    </reservationSet>
</DescribeInstancesResponse>

Sweet, sweet data! Hold on, though, I can't use grep to get at that. I'm going to have to remember how to interact with XML documents; I decided I had better see if I could dig up any XPath knowledge.

The next question was what tool I wanted to use to execute some XPath expressions against. I was not very keen on having to write an entire perl or python script to read the XML, build it into some DOM, and then loop several times over crusty data structures to get the data I wanted. I wanted to be able to do some more generalized things that are very easily accomplished in XPath, such as getting a list of instances based on a prefix of their Name, get a list of "stopped" instances, get a list of instances with public IP addresses, et cetera.

I figured there must be some command-line tool that would let me execute arbitrary XPath against an XML file. After poking around a while, I found XMLStarlet [http://xmlstar.sourceforge.net/]. Installing this on my MacBook Pro using Homebrew [http://mxcl.github.com/homebrew/] was very easy and I was off to the races.

After grappling for a very annoying amount of time with XML namespaces, I eventually figure I'd just strip the thing out so that I didn't have to deal with it. (If you leave the namespace in, you have to give it an alias and then specify that before every tag in your XPath expressions. No, thanks.)

$ cat strip_xmlns 
sed 's/ xmlns="[^>]*"//'

The xmlstarlet/xmlstar/xml tool works by specifying a template that includes some expression to match and some expressions to generate output. The tool does a lot, so some of the options can appear to be a bit verbose at first. Here's a very basic use of the tool to get just a list of instance IDs:

$ aws --xml din | strip_xmlns | xml sel -T -t -m '//instancesSet/item' -v 'instanceId' -n
i-d1dbceb2
i-afdacfcc
i-cbc2d7a8
i-99bfaafa
i-1d40547e
i-f7c5d494
i-83c5d4e0
i-77c4d514
i-75c4d516
i-47feee24
i-707d9512

You can see the XSLT that the tool is applying internally by using the -C option:

$ aws --xml din | strip_xmlns | xml sel -C -t -m '//instancesSet/item' -v 'instanceId' -n
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output omit-xml-declaration="yes" indent="no"/>
<xsl:template match="/">
<xsl:for-each select="//instancesSet/item">
<xsl:value-of select="instanceId"/>
<xsl:value-of select="'&#10;'"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>

OK, so, there's a tool that will let me execute some XPath and get back information about my instances, that's nice. Instead of trying to parse some formatted output, I should be able to select the XML elements I want for a particular task.

Say I want the instance IDs of all instances that are stopped:

  aws --xml din | strip_xmlns | xml sel -T -t -m '//instancesSet/item[instanceState/name="stopped"]' -v 'instanceId' -n

Or maybe I want all instances with Names that start with the string "ndb":

  aws --xml din | strip_xmlns | xml sel -T -t -m '//instancesSet/item[starts-with(tagSet/item[key="Name"]/value, "ndb32")]' -v 'instanceId' -n

Instead of having to write several loops in perl or python, I'm able to write a very straightforward expression that matches just the nodes I want. Instead of writing that XPath every time, of course, I'll put a few of the more popular ones into a script along with some flexibility to provide arbitrary filtering. (I call this WHERE in the script because that's the first thing my DBMS-addled brain came up with!)

#!/bin/bash
while getopts "p:s:w:" OPTION
do
    case $OPTION in
        p)
            WHERE="[starts-with(tagSet/item[key='Name']/value, '$OPTARG')]"
            ;;
        s)
            WHERE="[instanceState/name = '$OPTARG']"
            ;;
        w)
            WHERE="$OPTARG"
            ;;
    esac
done

MATCHEXPR="/DescribeInstancesResponse/reservationSet/item/instancesSet/item$WHERE"

aws --xml din | strip_xmlns | xml sel -T -t -m "$MATCHEXPR" -o "instanceId    " -v instanceId -n -o "instanceName    " -v tagSet/item[key=\"Name\"]/value -n -o "privateIp    " -v privateIpAddress -n -o "ipAddress    " -v ipAddress -n -o "instanceState    " -v instanceState/name -n -n

$ ec2-ls -p ndb
$ ec2-ls -s stopped
$ ec2-ls -w "[instanceType='m1.small']"

My script returns several items that may or may not be of interest to others. Further extension to the script could easily make the list of items returned a bit more useful. From that basically reasonable if limited script, I vastly overreached my bash skills and turned it into this monstrosity:

#!/bin/bash
OUTPUT="instanceId;instanceName:tagSet/item[key='Name']/value;privateIp:privateIpAddress;ipAddress;instanceState:instanceState/name"
DELIM=" " #there might be a <tab> in there!
declare -a XMLARGS

push()            # Push item on stack.
{
        if [ -z "$1" ]    # Nothing to push?
        then
          return
        fi
        XMLARGS[${#XMLARGS[*]}]="$1"
        return
}


while getopts "p:s:w:o:d:D" OPTION
do
        case $OPTION in
                p)
                        WHERE="[starts-with(tagSet/item[key='Name']/value, '$OPTARG')]"
                        ;;
                s)
                        WHERE="[instanceState/name = '$OPTARG']"
                        ;;
                w)
                        WHERE="$OPTARG"
                        ;;
                o)
                        OUTPUT="$OPTARG"
                        ;;
                d)
                        DELIM="$OPTARG"
                        ;;
                D)
                        DEBUG=1
                        ;;
                
        esac
done
shift $((OPTIND-1)) #something about argument processing, supposedly

for i in "sel" "-T" "-t" "-m"; do
        push "$i"
done

MATCHEXPR="/DescribeInstancesResponse/reservationSet/item/instancesSet/item$WHERE"
push "$MATCHEXPR";

OLDIFS=$IFS;
IFS=";"
for f in $OUTPUT; do
        FIELDNAME=$(echo $f | cut -d':' -f 1)
        FIELDEXPR=$(echo $f | cut -d':' -f 2)
        if [[ -z $FIELDEXPR ]]; then
                FIELDEXPR=$FIELDNAME
        fi
        push "-o";
        push "$FIELDNAME$DELIM";
        push "-v";
        push "$FIELDEXPR";
        push "-n";
done
push "-n";
IFS=$OLDIFS


if [[ $DEBUG -eq 1 ]]; then
        echo "$MATCHEXPR" >&2 
        echo "${XMLARGS[@]}" >&2
fi

aws --xml din | strip_xmlns | xml "${XMLARGS[@]}"

I'm sure there are plenty of problems with that script, but at least now I can finally get the information I want about my EC2 instances!

$ ec2-ls -p ndb32 -o "instanceId;privateIp:privateIpAddress"

About the Author

Kolbe Kegel is a Principal Support Engineer at SkySQL, where he has worked since 2011. Kolbe has worked with MySQL since 2005, first at MySQL, later at Sun Microsystems after its acquisition of MySQL Inc., then at Oracle after its acquisition of Sun.
Sophie Williams (not verified)

looks awesome! nedd to try!

looks awesome! nedd to try! thanks for sharing

ronaldbradford

AWS EC2 Scripts

You can find my scripts at https://github.com/ronaldbradford/aws

The aws_audit.sh script will give you valuable information on instances (EC2), load balances (ELB) and cross references for instance,public name, ip etc and including a parallel-ssh file.

I use these to manage environments with 500+ EC2 instances