This is the first of a series of posts about setting up a scalable and robust ”Hello Word” web server using Amazon EC2—an exercise I am working through. The target audience is developers with some experience using EC2. This first installment looks at how to get an EC2 instance to configure itself when it boots up.

Introduction

It is all too easy to treat EC2 instances as if they were ordinary virtual private servers, especially now that we have Elastic Block Store as an alternative to ephemeral instance-store disks. EC2 is supposed to be more than that. Reading about Elastic Load Balancing, Auto Scaling, and CloudWatch gives the impression that Amazon makes it easy to configure clusters that automatically scale and recover from failures. Unfortunately, it is not so easy.

This is the first installment of a series on setting up a simple, scalable and robust ”Hello World” web server. If the ultimate target is a dynamic web application supporting quick and easy deployment of new versions to staging or production, the road ahead will be long and hard indeed. For many projects, you will be better off building on a platform-as-a-service such as Heroku. Nevertheless, I will assume that for some reason or other we want to provision and pay for EC2 instances directly, maybe because the project targets a local audience somewhere in Europe and storing and processing the data in Amazon’s EU West region is desirable for legal reasons.

Analysis

If it weren’t for autoscaling, we could allocate new instances manually and just use an SSH-based push-style configuration tool such as Capistrano or Fabric to configure them. Autoscaling changes things, however, because it means that Amazon will automatically start new instances from time to time. These new instances will have to come up running the application without having it pushed to them.

There are a couple of ways to accomplish that. Most people just start up an instance, ssh into it to set it up, and then make a snapshot as an Amazon Machine Image (AMI). The problem with that is that you need to repeat the process whenever you want to upgrade to a new base AMI. Documenting the configuration is not enough; you really need to script the configuration process. The other way is to launch your instances from a stock AMI and have them configure themselves when they start up. The way to do that is to pass some form of configuration information into the instance as EC2 User Data. You may not have noticed it the last time you started up an EC2 instance using the AWS Management Console, but the launch wizard includes a User Data field. The value can be at most 16KB.

Both approaches are reasonable. I will take the simple and direct approach of launching a stock image and having the instance configure itself at startup. Whether fetching and installing additional software packages will slow instance startup is a concern, naturally. It is often not a problem. If it is, then that is a valid reason to create a customized AMI in advance. Another reason would be a preference for including secret credentials in the AMI.

Both Amazon Linux and the Ubuntu AMIs include Canonical’s cloud-init for configuring the instance at boot time using the provided user data. I have chosen to use Ubuntu.

Cloud-init understands a variety of formats (see the documentation), which can be combined as a MIME multipart file. I experimented with a combination of a cloud-config specification and a shell script but eventually settled on just a simple shell script for the purposes of this exercise. Given the immaturity of cloud-init and the 16KB limit on user data, the idea is to use the user data primarily just to cause cloud-init to fetch the real configuration information over the network.

This raises two followup questions:

  • What should be the format of the configuration information?
  • And how shall we fetch it?

For specifying the system configuration, I decided to use Puppet. It is a mature technology, and this is exactly what it was designed for.

The normal way to fetch a puppet configuration is by using puppet’s agent-master architecture. Although that might be ideal for a large organization running a diversity of systems, it seemed like too much hassle to go through for my purposes. It would also have raised the issue of how to deploy the puppet master. Interestingly, it isn’t necessary to run a puppet master or puppet agents in order to use puppet. But how to fetch the puppet configuration in that case?

One quickly realizes that any puppet configuration should be under version control. So I decided to kill two birds with one stone and use git to fetch the puppet configuration. In effect, I am letting bitbucket play the role of the puppet master. (I am using bitbucket instead of github primarily because bitbucket offers free private repositories.) Realistically, bitbucket is at least as reliable as any puppet master that I would run. If I really needed to do better in terms of reliability, I would consider fetching the puppet configuration from S3.

The Code

Putting it all together, here is a template for the user data to be injected into the instance (user-data-script.sh):

#!/bin/sh
set -e -x

apt-get --yes --quiet update
apt-get --yes --quiet install git puppet-common

#
# Fetch puppet configuration from public git repository.
#

mv /etc/puppet /etc/puppet.orig
git clone $puppet_source /etc/puppet

#
# Run puppet.
#

puppet apply /etc/puppet/manifests/init.pp

If you replace $puppet_source with the URL of a public git repository containing the desired contents of /etc/puppet and provide this shell script as user data when launching a recent Ubuntu AMI, the instance will execute puppet late in the boot process and configure itself accordingly. I have set up a suitable demo repository. Here is a python script that launches the instance for you (launch_instance.py):

#!/usr/bin/python

from string import Template

import boto.ec2

PUPPET_SOURCE = 'https://bitbucket.org/rimey/hello-ec2-puppetboot.git'

def get_script(filename='user-data-script.sh'):
    template = open(filename).read()
    return Template(template).substitute(
        puppet_source=PUPPET_SOURCE,
    )

def launch():
    connection = boto.ec2.connect_to_region('us-east-1')
    return connection.run_instances(
        image_id = 'ami-6ba27502',  # us-east-1 oneiric i386 ebs 20120108
        instance_type = 't1.micro',
        key_name = 'awskey',
        security_groups = ['default'],
        user_data=get_script(),
    )

if __name__ == '__main__':
    launch()

Prerequisites:

  • You have boto installed (”apt-get install python-boto” or ”pip install boto”).
  • Your ~/.boto file specifies your AWS access key and secret key.
  • You have a key pair named awskey in AWS (or change the name in the Python script).
  • Your default security group allows incoming TCP connections to ports 22 and 80.

Use the AWS management console to learn the Public DNS name of the instance, give the instance a couple of minutes to start, and then point a web browser at it. Don’t forget to terminate the instance from the AWS management console when you are done with it.

The puppet configuration installs the nginx web server. Here is the key bit:

package { 'nginx':
    name    => 'nginx-light',
    ensure  => installed,
}

file { 'www':
    ensure  => directory,
    path    => '/usr/share/nginx/www',
    source  => '/etc/puppet/private/www',
    recurse => true,
    require => Package['nginx'],
}

service { 'nginx':
    ensure  => running,
    enable  => true,
    require => File['www'],
}

Using a Private Repository

Finally, the hello-ec2-puppetboot bitbucket git repository that I have made available for the demo above is publicly readable. This simplifies the demo, but it is unrealistic, because you might want to include proprietary code or data, or even secret API keys. To enable the instance to read a private git repository on bitbucket, you would proceed as follows:

  1. Register a separate user at bitbucket solely for deployment purposes. This user will own no repositories.
  2. Grant the deployment user read access to the private repository containing the puppet configuration.
  3. Make a vcs_keys directory and run ”ssh-keygen -C deploy -f vcs_keys/id_rsa”.
  4. Upload vcs_keys/id_rsa.pub to the set of ssh public keys in the deployment user’s bitbucket account settings.
  5. Run ”ssh-keyscan bitbucket.org >vcs_keys/known_hosts”.
  6. Update the value of puppet_source in the Python script to one of the form git@bitbucket.org:user/repo.git and update the definition of get_script()as follows:
    def get_script(filename='user-data-script.sh'):
        template = open(filename).read()
        return Template(template).substitute(
            puppet_source=PUPPET_SOURCE,
            vcs_known_hosts=open('vcs_keys/known_hosts').read().strip(),
            vcs_deploy_public=open('vcs_keys/id_rsa.pub').read().strip(),
            vcs_deploy_private=open('vcs_keys/id_rsa').read().strip(),
        )
  7. Update user-data-script.shas follows:
    #!/bin/sh
    set -e -x
    
    apt-get --yes --quiet update
    apt-get --yes --quiet install git puppet-common
    
    #
    # Set up ssh keys to enable git read access.
    #
    
    cat <<EOF >>/root/.ssh/known_hosts
    $vcs_known_hosts
    EOF
    
    cat <<EOF >/root/.ssh/id_rsa.pub
    $vcs_deploy_public
    EOF
    
    cat <<EOF >/root/.ssh/id_rsa
    $vcs_deploy_private
    EOF
    
    chmod 600 /root/.ssh/id_rsa
    
    #
    # Fetch puppet configuration using git.
    #
    
    mv /etc/puppet /etc/puppet.orig
    git clone $puppet_source /etc/puppet
    
    #
    # Run puppet.
    #
    
    puppet apply /etc/puppet/manifests/init.pp

Conclusion

This gives us a practical solution for using EC2 instance metadata to get instances to configure themselves when they start up. Although my background research turned up a number of recommendations in favor of this approach of booting from stock AMIs, it is strange that I found little in the way of war stories from people who are actually doing this.

Don’t forget to terminate any EC2 instances that you started up.

The next installment, Locking it Down, will look more carefully at the necessity of including secrets such as id_rsa in the instance metadata and how to ensure that only the root user has access to them.

Artikkelin tagit:
 

2 vastausta artikkeliin Hello EC2, Part 1: Bootstrapping Instances with cloud-init, git, and puppet

  1. [...] Hello EC2, Part 1: Bootstrapping Instances with cloud-init, git, and puppet [...]

  2. [...] is the third in a series of posts about setting up a scalable and robust “Hello World” web server using Amazon EC2. The [...]

Vastaa

Sähköpostiosoitettasi ei julkaista. Pakolliset kentät on merkitty *

Voit käyttää näitä HTML-tageja ja attribuutteja: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>