Euca2ools: Past, Present, and Future
For those who don’t know, I work on the euca2ools suite of command line
tools for interacting with Eucalyptus and Amazon Web Services clouds on
Launchpad. As of late the project has stagnated somewhat, due in part to the
sheer number of different tools it includes. Nearly every command one can send
to a server that uses Amazon’s APIs should have at least one corresponding
command line tool, making development of euca2ools’ code repetitive and
error-prone.
Today this is going to end.
But before we get to that part, let’s chronicle how euca2ools got to where
they are today.
The Past
Early euca2ools versions employed the popular boto Python library to do
their heavy lifting. Each tool of this sort triggers a long chain of events:
1.)The tool translates data from the command line into its internal data
structures.
2.)The tool translates its internal data into the form that boto expects
and then hands it off to boto.
3.)Boto translates the data into the form that the server expects and then
sends it to the server.
4.)When the server responds, boto translates its response into a packaged
form that is useful for programming and returns it to the tool.
5.)The tool immediately tears that version back apart and translates it
into a text-based form that can go back to the command line.
Things shouldn’t be this convoluted. Not in Python.
The Present
Tackling this problem involved coming up with ways to simplify not only the
code, but also the process through which they are written. This led to two
major changes, upon which all of the current euca2ools code is built.
“eucacommand”
The first step was consolidating all of the code involved in performing the
first step of this process — reading data from the command line — into one
location. Each tool then simply needed to describe what it expected to receive
from the command line, and the shared code would take care of the rest. For
example, let’s look at part of an older command, euca-create-volume:
class CreateVolume(EucaCommand):
Description = 'Creates a volume
in a specified availability zone.'
Options = [Param(name='size',
short_name='s', long_name='size',
optional=True,
ptype='integer',
doc='size of the
volume (in GiB).'),
Param(name='snapshot',
long_name='snapshot',
optional=True, ptype='string',
doc="""snapshot id to create the volume from.
Either size or
snapshot can be specified (not both)."""),
Param(name='zone',
short_name='z', long_name='zone',
optional=False,
ptype='string',
doc='availability zone to create the volume in')]
Because there are three Params the shared code library reads three bits of
info from the command line and hands them to the command’s code, which then
hands them to boto, and so on.
This methodology forms the basis for all of the current euca2ools that
begin with “euca”.
Roboto
For a euca2ools command line tool to be useful it has to gather data from
the command line, send these data to the server, and return data from the
server to the user. A little-known boto sub-project written by boto developer
(and former euca2ools developer) Mitch Garnaat, roboto, takes this statement
literally and opts to let tools work at a lower level: instead of translating
data from the command line into an intermediate format to send to boto, tools
send these data directly to the server in the form that the server expects. The
effect of this is that of essentially removing boto from the euca2ools code
base altogether. By removing boto from the path that data have to take to get
from the command line to the server and back, roboto makes tool writing and
debugging simpler because there is less code to walk through and understand.
Roboto is the basis for all of the current euca2ools that begin with
“euare”.
The Future
That is the state of the code today. Where do we go from here? While roboto
allows one to create command line tools with a minimal amount of effort, it has
several rough edges which prevented it from taking off and which make it
sub-optimal for building out the hundreds of commands that the euca2ools suite
will soon need to cover:
User-unfriendly — When a user types something wrong or forgets to include something,
roboto’s messages are often uselessly terse and unhelpful.
A steeper learning curve than necessary — Roboto contains a large amount of custom code
dedicated to fetching information from the command line. This steepens the learning
curve for people who want to contribute code or fix bugs.
Too much hardcoding — Roboto assumes that all tools do certain things, such as ascertaining
what keys they should use to access the cloud, the same way.
Still more work than it has to be — Though it makes writing tools simpler, roboto
still hands each tool a bucket of information and expects the tool to pick out
the bits the server needs and send them onward.
Enter requestbuilder
Requestbuilder is a new Python library that attempts to rethink the way
roboto works in a way that is more familiar to the typical Python developer and
requires less custom code to run. The easiest way to illustrate this is with an
example.
A command line tool embodies a specific request to the server, so each such
tool defines a Request that describes how it works:
class ListUsers(EuareRequest):
Description = 'List the users who
start with a specific path'
Args = [Arg('-p', '--prefix',
dest='PathPrefix', metavar='PREFIX',
help='list only users
whose paths begin with a prefix'),
Arg('--max-items',
type=int, dest='MaxItems',
help='limit the
number of results')]
def main(self):
return self.send()
def print_result(self, result):
for user in result['Users']:
print user['Arn']
Those familiar with Python’s argparse library will recognize the code
inside Arg(...), because requestbuilder does away with roboto’s custom code for
reading things off the command line and instead lets argparse do the work. This
cuts down on the amount of code we need to maintain, makes tool writing easier
for developers who are already familiar with the Python standard library, and
makes command line-related error messages much more user-friendly.
When the tool starts running, requestbuilder uses data from the command
line to fill in a dictionary called args and runs the tool’s main method, whose
job is to process this information and fill in the portions of the request that
will be sent to the server: params, headers, and post_data, and then run the
send method to send it all to the server and retrieve a response. Attaching
each of these sets of data to the request instead of passing them around
between methods allows one to send a request, tweak it, and send the tweaked
version as well.
Why doesn’t the code above fill any of these things in? Since most of the
data that comes off the command line goes directly to the server, when a tool
runs send requestbuilder will automatically fill in params from the contents of
args so the tool doesn’t have to: whatever the user supplied with --prefix at
the command line gets sent to the server with the name PathPrefix, and so
forth.
But what if something should not be sent to the server? While data from the
command line go into params to be sent to the server by default, one can tell
requestbuilder to send a particular bit of data elsewhere instead:
Arg('--debug', action='store_true', route_to=None)
None instructs requestbuilder to leave the “debug” flag alone and not
attempt to send it anywhere. Data can also go elsewhere, such as to the
connection that gets set up as the tool contacts the server:
Arg('-I', '--access-key-id', dest='aws_access_key_id', route_to=CONNECTION)
Astute readers will note that I haven’t described what EuareRequest in the
earlier example does, so here is the code for that:
class EuareRequest(BaseRequest):
ServiceClass = Euare
Args = [Arg('--delegate',
dest='DelegateAccount', metavar='ACCOUNT',
help='''[Eucalyptus
extension] run this command as another
account (only
usable by cloud administrators)''')]
Requestbuilder makes tool writers’ jobs easier by allowing one type of
request to inherit its command line options from another type of request and
then supply their own by simply listing more of them. This is a little
different from the way Python usually works; Requestbuilder does some magic
behind the scenes to make this possible. As a result, everything common to
commands that access the EUARE service (Eucalyptus’s equivalent of Amazon’s IAM
service) can go into one place to be shared with others.
The final piece of information requestbuilder needs is a ServiceClass,
which describes the web service that the tool connects to. A service class is
another simple bit of code that looks like this:
class Euare(BaseService):
Description = 'Eucalyptus User,
Authorization and Reporting Environment'
APIVersion = '2010-05-08'
EnvURL = 'EUARE_URL'
The net gain from all this is a smaller, but much more flexible code base
that should be able to scale better than anything we have had before.
Requestbuilder’s use of Python’s argparse library also makes tools much more
informative to users than ever before.
No comments:
Post a Comment