mirror of https://github.com/python/cpython.git
1237 lines
58 KiB
TeX
1237 lines
58 KiB
TeX
\documentstyle[11pt]{article}
|
|
\newcommand{\Cpp}{C\protect\raisebox{.18ex}{++}}
|
|
|
|
\title{
|
|
Interactively Testing Remote Servers Using the Python Programming Language
|
|
}
|
|
|
|
\author{
|
|
Guido van Rossum \\
|
|
Dept. CST, CWI, P.O. Box 94079 \\
|
|
1090 GB Amsterdam, The Netherlands \\
|
|
E-mail: {\tt guido@cwi.nl}
|
|
\and
|
|
Jelke de Boer \\
|
|
HIO Enschede; P.O.Box 1326 \\
|
|
7500 BH Enschede, The Netherlands
|
|
}
|
|
|
|
\begin{document}
|
|
|
|
\maketitle
|
|
|
|
\begin{abstract}
|
|
This paper describes how two tools that were developed quite
|
|
independently gained in power by a well-designed connection between
|
|
them. The tools are Python, an interpreted prototyping language, and
|
|
AIL, a Remote Procedure Call stub generator. The context is Amoeba, a
|
|
well-known distributed operating system developed jointly by the Free
|
|
University and CWI in Amsterdam.
|
|
|
|
As a consequence of their integration, both tools have profited:
|
|
Python gained usability when used with Amoeba --- for which it was not
|
|
specifically developed --- and AIL users now have a powerful
|
|
interactive tool to test servers and to experiment with new
|
|
client/server interfaces.%
|
|
\footnote{
|
|
An earlier version of this paper was presented at the Spring 1991
|
|
EurOpen Conference in Troms{\o} under the title ``Linking a Stub
|
|
Generator (AIL) to a Prototyping Language (Python).''
|
|
}
|
|
\end{abstract}
|
|
|
|
\section{Introduction}
|
|
|
|
Remote Procedure Call (RPC) interfaces, used in distributed systems
|
|
like Amoeba
|
|
\cite{Amoeba:IEEE,Amoeba:CACM},
|
|
have a much more concrete character than local procedure call
|
|
interfaces in traditional systems. Because clients and servers may
|
|
run on different machines, with possibly different word size, byte
|
|
order, etc., much care is needed to describe interfaces exactly and to
|
|
implement them in such a way that they continue to work when a client
|
|
or server is moved to a different machine. Since machines may fail
|
|
independently, error handling must also be treated more carefully.
|
|
|
|
A common approach to such problems is to use a {\em stub generator}.
|
|
This is a program that takes an interface description and transforms
|
|
it into functions that must be compiled and linked with client and
|
|
server applications. These functions are called by the application
|
|
code to take care of details of interfacing to the system's RPC layer,
|
|
to implement transformations between data representations of different
|
|
machines, to check for errors, etc. They are called `stubs' because
|
|
they don't actually perform the action that they are called for but
|
|
only relay the parameters to the server
|
|
\cite{RPC}.
|
|
|
|
Amoeba's stub generator is called AIL, which stands for Amoeba
|
|
Interface Language
|
|
\cite{AIL}.
|
|
The first version of AIL generated only C functions, but an explicit
|
|
goal of AIL's design was {\em retargetability}: it should be possible
|
|
to add back-ends that generate stubs for different languages from the
|
|
same interface descriptions. Moreover, the stubs generated by
|
|
different back-ends must be {\em interoperable}: a client written in
|
|
Modula-3, say, should be able to use a server written in C, and vice
|
|
versa.
|
|
|
|
This interoperability is the key to the success of the marriage
|
|
between AIL and Python. Python is a versatile interpreted language
|
|
developed by the first author. Originally intended as an alternative
|
|
for the kind of odd jobs that are traditionally solved by a mixture of
|
|
shell scripts, manually given shell commands, and an occasional ad hoc
|
|
C program, Python has evolved into a general interactive prototyping
|
|
language. It has been applied to a wide range of problems, from
|
|
replacements for large shell scripts to fancy graphics demos and
|
|
multimedia applications.
|
|
|
|
One of Python's strengths is the ability for the user to type in some
|
|
code and immediately run it: no compilation or linking is necessary.
|
|
Interactive performance is further enhanced by Python's concise, clear
|
|
syntax, its very-high-level data types, and its lack of declarations
|
|
(which is compensated by run-time type checking). All this makes
|
|
programming in Python feel like a leisure trip compared to the hard
|
|
work involved in writing and debugging even a smallish C program.
|
|
|
|
It should be clear by now that Python will be the ideal tool to test
|
|
servers and their interfaces. Especially during the development of a
|
|
complex server, one often needs to generate test requests on an ad hoc
|
|
basis, to answer questions like ``what happens if request X arrives
|
|
when the server is in state Y,'' to test the behavior of the server
|
|
with requests that touch its limitations, to check server responses to
|
|
all sorts of wrong requests, etc. Python's ability to immediately
|
|
execute `improvised' code makes it a much better tool for this
|
|
situation than C.
|
|
|
|
The link to AIL extends Python with the necessary functionality to
|
|
connect to arbitrary servers, making the server testbed sketched above
|
|
a reality. Python's high-level data types, general programming
|
|
features, and system interface ensure that it has all the power and
|
|
flexibility needed for the job.
|
|
|
|
One could go even further than this. Current distributed operating
|
|
systems, based on client-server interaction, all lack a good command
|
|
language or `shell' to give adequate access to available services.
|
|
Python has considerable potential for becoming such a shell.
|
|
|
|
\subsection{Overview of this Paper}
|
|
|
|
The rest of this paper contains three major sections and a conclusion.
|
|
First an overview of the Python programming language is given. Next
|
|
comes a short description of AIL, together with some relevant details
|
|
about Amoeba. Finally, the design and construction of the link
|
|
between Python and AIL is described in much detail. The conclusion
|
|
looks back at the work and points out weaknesses and strengths of
|
|
Python and AIL that were discovered in the process.
|
|
|
|
\section{An Overview of Python}
|
|
|
|
Python%
|
|
\footnote{
|
|
Named after the funny TV show, not the nasty reptile.
|
|
}
|
|
owes much to ABC
|
|
\cite{ABC},
|
|
a language developed at CWI as a programming language for non-expert
|
|
computer users. Python borrows freely from ABC's syntax and data
|
|
types, but adds modules, exceptions and classes, extensibility, and
|
|
the ability to call system functions. The concepts of modules,
|
|
exceptions and (to some extent) classes are influenced strongly by
|
|
their occurrence in Modula-3
|
|
\cite{Modula-3}.
|
|
|
|
Although Python resembles ABC in many ways, there is a a clear
|
|
difference in application domain. ABC is intended to be the only
|
|
programming language for those who use a computer as a tool, but
|
|
occasionally need to write a program. For this reason, ABC is not
|
|
just a programming language but also a programming environment, which
|
|
comes with an integrated syntax-directed editor and some source
|
|
manipulation commands. Python, on the other hand, aims to be a tool
|
|
for professional (system) programmers, for whom having a choice of
|
|
languages with different feature sets makes it possible to choose `the
|
|
right tool for the job.' The features added to Python make it more
|
|
useful than ABC in an environment where access to system functions
|
|
(such as file and directory manipulations) are common. They also
|
|
support the building of larger systems and libraries. The Python
|
|
implementation offers little in the way of a programming environment,
|
|
but is designed to integrate seamlessly with existing programming
|
|
environments (e.g. UNIX and Emacs).
|
|
|
|
Perhaps the best introduction to Python is a short example. The
|
|
following is a complete Python program to list the contents of a UNIX
|
|
directory.
|
|
\begin{verbatim}
|
|
import sys, posix
|
|
|
|
def ls(dirname): # Print sorted directory contents
|
|
names = posix.listdir(dirname)
|
|
names.sort()
|
|
for name in names:
|
|
if name[0] != '.': print name
|
|
|
|
ls(sys.argv[1])
|
|
\end{verbatim}
|
|
The largest part of this program, in the middle starting with {\tt
|
|
def}, is a function definition. It defines a function named {\tt ls}
|
|
with a single parameter called {\tt dirname}. (Comments in Python
|
|
start with `\#' and extend to the end of the line.) The function body
|
|
is indented: Python uses indentation for statement grouping instead of
|
|
braces or begin/end keywords. This is shorter to type and avoids
|
|
frustrating mismatches between the perception of grouping by the user
|
|
and the parser. Python accepts one statement per line; long
|
|
statements may be broken in pieces using the standard backslash
|
|
convention. If the body of a compound statement is a single, simple
|
|
statement, it may be placed on the same line as the head.
|
|
|
|
The first statement of the function body calls the function {\tt
|
|
listdir} defined in the module {\tt posix}. This function returns a
|
|
list of strings representing the contents of the directory name passed
|
|
as a string argument, here the argument {\tt dirname}. If {\tt
|
|
dirname} were not a valid directory name, or perhaps not even a
|
|
string, {\tt listdir} would raise an exception and the next statement
|
|
would never be reached. (Exceptions can be caught in Python; see
|
|
later.) Assuming {\tt listdir} returns normally, its result is
|
|
assigned to the local variable {\tt names}.
|
|
|
|
The second statement calls the method {\tt sort} of the variable {\tt
|
|
names}. This method is defined for all lists in Python and does the
|
|
obvious thing: the elements of the list are reordered according to
|
|
their natural ordering relationship. Since in our example the list
|
|
contains strings, they are sorted in ascending ASCII order.
|
|
|
|
The last two lines of the function contain a loop that prints all
|
|
elements of the list whose first character isn't a period. In each
|
|
iteration, the {\tt for} statement assigns an element of the list to
|
|
the local variable {\tt name}. The {\tt print} statement is intended
|
|
for simple-minded output; more elaborate formatting is possible with
|
|
Python's string handling functions.
|
|
|
|
The other two parts of the program are easily explained. The first
|
|
line is an {\tt import} statement that tells the interpreter to import
|
|
the modules {\tt sys} and {\tt posix}. As it happens these are both
|
|
built into the interpreter. Importing a module (built-in or
|
|
otherwise) only makes the module name available in the current scope;
|
|
functions and data defined in the module are accessed through the dot
|
|
notation as in {\tt posix.listdir}. The scope rules of Python are
|
|
such that the imported module name {\tt posix} is also available in
|
|
the function {\tt ls} (this will be discussed in more detail later).
|
|
|
|
Finally, the last line of the program calls the {\tt ls} function with
|
|
a definite argument. It must be last since Python objects must be
|
|
defined before they can be used; in particular, the function {\tt ls}
|
|
must be defined before it can be called. The argument to {\tt ls} is
|
|
{\tt sys.argv[1]}, which happens to be the Python equivalent of {\tt
|
|
\$1} in a shell script or {\tt argv[1]} in a C program's {\tt main}
|
|
function.
|
|
|
|
\subsection{Python Data Types}
|
|
|
|
(This and the following subsections describe Python in quite a lot of
|
|
detail. If you are more interested in AIL, Amoeba and how they are
|
|
linked with Python, you can skip to section 3 now.)
|
|
|
|
Python's syntax may not have big surprises (which is exactly as it
|
|
should be), but its data types are quite different from what is found
|
|
in languages like C, Ada or Modula-3. All data types in Python, even
|
|
integers, are `objects'. All objects participate in a common garbage
|
|
collection scheme (currently implemented using reference counting).
|
|
Assignment is cheap, independent of object size and type: only a
|
|
pointer to the assigned object is stored in the assigned-to variable.
|
|
No type checking is performed on assignment; only specific operations
|
|
like addition test for particular operand types.
|
|
|
|
The basic object types in Python are numbers, strings, tuples, lists
|
|
and dictionaries. Some other object types are open files, functions,
|
|
modules, classes, and class instances; even types themselves are
|
|
represented as objects. Extension modules written in C can define
|
|
additional object types; examples are objects representing windows and
|
|
Amoeba capabilities. Finally, the implementation itself makes heavy
|
|
use of objects, and defines some private object types that aren't
|
|
normally visible to the user. There is no explicit pointer type in
|
|
Python.
|
|
|
|
{\em Numbers}, both integers and floating point, are pretty
|
|
straightforward. The notation for numeric literals is the same as in
|
|
C, including octal and hexadecimal integers; precision is the same as
|
|
{\tt long} or {\tt double} in C\@. A third numeric type, `long
|
|
integer', written with an `L' suffix, can be used for arbitrary
|
|
precision calculations. All arithmetic, shifting and masking
|
|
operations from C are supported.
|
|
|
|
{\em Strings} are `primitive' objects just like numbers. String
|
|
literals are written between single quotes, using similar escape
|
|
sequences as in C\@. Operations are built into the language to
|
|
concatenate and to replicate strings, to extract substrings, etc.
|
|
There is no limit to the length of the strings created by a program.
|
|
There is no separate character data type; strings of length one do
|
|
nicely.
|
|
|
|
{\em Tuples} are a way to `pack' small amounts of heterogeneous data
|
|
together and carry them around as a unit. Unlike structure members in
|
|
C, tuple items are nameless. Packing and unpacking assignments allow
|
|
access to the items, for example:
|
|
\begin{verbatim}
|
|
x = 'Hi', (1, 2), 'World' # x is a 3-item tuple,
|
|
# its middle item is (1, 2)
|
|
p, q, r = x # unpack x into p, q and r
|
|
a, b = q # unpack q into a and b
|
|
\end{verbatim}
|
|
A combination of packing and unpacking assignment can be used as
|
|
parallel assignment, and is idiom for permutations, e.g.:
|
|
\begin{verbatim}
|
|
p, q = q, p # swap without temporary
|
|
a, b, c = b, c, a # cyclic permutation
|
|
\end{verbatim}
|
|
Tuples are also used for function argument lists if there is more than
|
|
one argument. A tuple object, once created, cannot be modified; but
|
|
it is easy enough to unpack it and create a new, modified tuple from
|
|
the unpacked items and assign this to the variable that held the
|
|
original tuple object (which will then be garbage-collected).
|
|
|
|
{\em Lists} are array-like objects. List items may be arbitrary
|
|
objects and can be accessed and changed using standard subscription
|
|
notation. Lists support item insertion and deletion, and can
|
|
therefore be used as queues, stacks etc.; there is no limit to their
|
|
size.
|
|
|
|
Strings, tuples and lists together are {\em sequence} types. These
|
|
share a common notation for generic operations on sequences such as
|
|
subscription, concatenation, slicing (taking subsequences) and
|
|
membership tests. As in C, subscripts start at 0.
|
|
|
|
{\em Dictionaries} are `mappings' from one domain to another. The
|
|
basic operations on dictionaries are item insertion, extraction and
|
|
deletion, using subscript notation with the key as subscript. (The
|
|
current implementation allows only strings in the key domain, but a
|
|
future version of the language may remove this restriction.)
|
|
|
|
\subsection{Statements}
|
|
|
|
Python has various kinds of simple statements, such as assignments
|
|
and {\tt print} statements, and several kinds of compound statements,
|
|
like {\tt if} and {\tt for} statements. Formally, function
|
|
definitions and {\tt import} statements are also statements, and there
|
|
are no restrictions on the ordering of statements or their nesting:
|
|
{\tt import} may be used inside a function, functions may be defined
|
|
conditionally using an {\tt if} statement, etc. The effect of a
|
|
declaration-like statement takes place only when it is executed.
|
|
|
|
All statements except assignments and expression statements begin with
|
|
a keyword: this makes the language easy to parse. An overview of the
|
|
most common statement forms in Python follows.
|
|
|
|
An {\em assignment} has the general form
|
|
\vspace{\itemsep}
|
|
|
|
\noindent
|
|
{\em variable $=$ variable $= ... =$ variable $=$ expression}
|
|
\vspace{\itemsep}
|
|
|
|
It assigns the value of the expression to all listed variables. (As
|
|
shown in the section on tuples, variables and expressions can in fact
|
|
be comma-separated lists.) The assignment operator is not an
|
|
expression operator; there are no horrible things in Python like
|
|
\begin{verbatim}
|
|
while (p = p->next) { ... }
|
|
\end{verbatim}
|
|
Expression syntax is mostly straightforward and will not be explained
|
|
in detail here.
|
|
|
|
An {\em expression statement} is just an expression on a line by
|
|
itself. This writes the value of the expression to standard output,
|
|
in a suitably unambiguous way, unless it is a `procedure call' (a
|
|
function call that returns no value). Writing the value is useful
|
|
when Python is used in `calculator mode', and reminds the programmer
|
|
not to ignore function results.
|
|
|
|
The {\tt if} statement allows conditional execution. It has optional
|
|
{\tt elif} and {\tt else} parts; a construct like {\tt
|
|
if...elif...elif...elif...else} can be used to compensate for the
|
|
absence of a {\em switch} or {\em case} statement.
|
|
|
|
Looping is done with {\tt while} and {\tt for} statements. The latter
|
|
(demonstrated in the `ls' example earlier) iterates over the elements
|
|
of a `sequence' (see the discussion of data types below). It is
|
|
possible to terminate a loop with a {\tt break} statement or to start
|
|
the next iteration with {\tt continue}. Both looping statements have
|
|
an optional {\tt else} clause which is executed after the loop is
|
|
terminated normally, but skipped when it is terminated by {\tt break}.
|
|
This can be handy for searches, to handle the case that the item is
|
|
not found.
|
|
|
|
Python's {\em exception} mechanism is modelled after that of Modula-3.
|
|
Exceptions are raised by the interpreter when an illegal operation is
|
|
tried. It is also possible to explicitly raise an exception with the
|
|
{\tt raise} statement:
|
|
\vspace{\itemsep}
|
|
|
|
\noindent
|
|
{\tt raise {\em expression}, {\em expression}}
|
|
\vspace{\itemsep}
|
|
|
|
The first expression identifies which exception should be raised;
|
|
there are several built-in exceptions and the user may define
|
|
additional ones. The second, optional expression is passed to the
|
|
handler, e.g. as a detailed error message.
|
|
|
|
Exceptions may be handled (caught) with the {\tt try} statement, which
|
|
has the following general form:
|
|
\vspace{\itemsep}
|
|
|
|
\noindent
|
|
{\tt
|
|
\begin{tabular}{l}
|
|
try: {\em block} \\
|
|
except {\em expression}, {\em variable}: {\em block} \\
|
|
except {\em expression}, {\em variable}: {\em block} \\
|
|
... \\
|
|
except: {\em block}
|
|
\end{tabular}
|
|
}
|
|
\vspace{\itemsep}
|
|
|
|
When an exception is raised during execution of the first block, a
|
|
search for an exception handler starts. The first {\tt except} clause
|
|
whose {\em expression} matches the exception is executed. The
|
|
expression may specify a list of exceptions to match against. A
|
|
handler without an expression serves as a `catch-all'. If there is no
|
|
match, the search for a handler continues with outer {\tt try}
|
|
statements; if no match is found on the entire invocation stack, an
|
|
error message and stack trace are printed, and the program is
|
|
terminated (interactively, the interpreter returns to its main loop).
|
|
|
|
Note that the form of the {\tt except} clauses encourages a style of
|
|
programming whereby only selected exceptions are caught, passing
|
|
unanticipated exceptions on to the caller and ultimately to the user.
|
|
This is preferable over a simpler `catch-all' error handling
|
|
mechanism, where a simplistic handler intended to catch a single type
|
|
of error like `file not found' can easily mask genuine programming
|
|
errors --- especially in a language like Python which relies strongly
|
|
on run-time checking and allows the catching of almost any type of
|
|
error.
|
|
|
|
Other common statement forms, which we have already encountered, are
|
|
function definitions, {\tt import} statements and {\tt print}
|
|
statements. There is also a {\tt del} statement to delete one or more
|
|
variables, a {\tt return} statement to return from a function, and a
|
|
{\tt global} statement to allow assignments to global variables.
|
|
Finally, the {\tt pass} statement is a no-op.
|
|
|
|
\subsection{Execution Model}
|
|
|
|
A Python program is executed by a stack-based interpreter.
|
|
|
|
When a function is called, a new `execution environment' for it is
|
|
pushed onto the stack. An execution environment contains (among other
|
|
data) pointers to two `symbol tables' that are used to hold variables:
|
|
the local and the global symbol table. The local symbol table
|
|
contains local variables of the current function invocation (including
|
|
the function arguments); the global symbol table contains variables
|
|
defined in the module containing the current function.
|
|
|
|
The `global' symbol table is thus only global with respect to the
|
|
current function. There are no system-wide global variables; using
|
|
the {\tt import} statement it is easy enough to reference variables
|
|
that are defined in other modules. A system-wide read-only symbol
|
|
table is used for built-in functions and constants though.
|
|
|
|
On assignment to a variable, by default an entry for it is made in the
|
|
local symbol table of the current execution environment. The {\tt
|
|
global} command can override this (it is not enough that a global
|
|
variable by the same name already exists). When a variable's value is
|
|
needed, it is searched first in the local symbol table, then in the
|
|
global one, and finally in the symbol table containing built-in
|
|
functions and constants.
|
|
|
|
The term `variable' in this context refers to any name: functions and
|
|
imported modules are searched in exactly the same way.
|
|
|
|
Names defined in a module's symbol table survive until the end of the
|
|
program. This approximates the semantics of file-static global
|
|
variables in C or module variables in Modula-3. A module is
|
|
initialized the first time it is imported, by executing the text of
|
|
the module as a parameterless function whose local and global symbol
|
|
tables are the same, so names are defined in module's symbol table.
|
|
(Modules implemented in C have another way to define symbols.)
|
|
|
|
A Python main program is read from standard input or from a script
|
|
file passed as an argument to the interpreter. It is executed as if
|
|
an anonymous module was imported. Since {\tt import} statements are
|
|
executed like all other statements, the initialization order of the
|
|
modules used in a program is defined by the flow of control through
|
|
the program.
|
|
|
|
The `attribute' notation {\em m.name}, where {\em m} is a module,
|
|
accesses the symbol {\em name} in that module's symbol table. It can
|
|
be assigned to as well. This is in fact a special case of the
|
|
construct {\em x.name} where {\em x} denotes an arbitrary object; the
|
|
type of {\em x} determines how this is to be interpreted, and what
|
|
assignment to it means.
|
|
|
|
For instance, when {\tt a} is a list object, {\tt a.append} yields a
|
|
built-in `method' object which, when called, appends an item to {\tt a}.
|
|
(If {\tt a} and {\tt b} are distinct list objects, {\tt a.append} and
|
|
{\tt b.append} are distinguishable method objects.) Normally, in
|
|
statements like {\tt a.append(x)}, the method object {\tt a.append} is
|
|
called and then discarded, but this is a matter of convention.
|
|
|
|
List attributes are read-only --- the user cannot define new list
|
|
methods. Some objects, like numbers and strings, have no attributes
|
|
at all. Like all type checking in Python, the meaning of an attribute
|
|
is determined at run-time --- when the parser sees {\em x.name}, it
|
|
has no idea of the type of {\em x}. Note that {\em x} here does not
|
|
have to be a variable --- it can be an arbitrary (perhaps
|
|
parenthesized) expression.
|
|
|
|
Given the flexibility of the attribute notation, one is tempted to use
|
|
methods to replace all standard operations. Yet, Python has kept a
|
|
small repertoire of built-in functions like {\tt len()} and {\tt
|
|
abs()}. The reason is that in some cases the function notation is
|
|
more familiar than the method notation; just like programs would
|
|
become less readable if all infix operators were replaced by function
|
|
calls, they would become less readable if all function calls had to be
|
|
replaced by method calls (and vice versa!).
|
|
|
|
The choice whether to make something a built-in function or a method
|
|
is a matter of taste. For arithmetic and string operations, function
|
|
notation is preferred, since frequently the argument to such an
|
|
operation is an expression using infix notation, as in {\tt abs(a+b)};
|
|
this definitely looks better than {\tt (a+b).abs()}. The choice
|
|
between make something a built-in function or a function defined in a
|
|
built-in method (requiring {\tt import}) is similarly guided by
|
|
intuition; all in all, only functions needed by `general' programming
|
|
techniques are built-in functions.
|
|
|
|
\subsection{Classes}
|
|
|
|
Python has a class mechanism distinct from the object-orientation
|
|
already explained. A class in Python is not much more than a
|
|
collection of methods and a way to create class instances. Class
|
|
methods are ordinary functions whose first parameter is the class
|
|
instance; they are called using the method notation.
|
|
|
|
For instance, a class can be defined as follows:
|
|
\begin{verbatim}
|
|
class Foo:
|
|
def meth1(self, arg): ...
|
|
def meth2(self): ...
|
|
\end{verbatim}
|
|
A class instance is created by
|
|
{\tt x = Foo()}
|
|
and its methods can be called thus:
|
|
\begin{verbatim}
|
|
x.meth1('Hi There!')
|
|
x.meth2()
|
|
\end{verbatim}
|
|
The functions used as methods are also available as attributes of the
|
|
class object, and the above method calls could also have been written
|
|
as follows:
|
|
\begin{verbatim}
|
|
Foo.meth1(x, 'Hi There!')
|
|
Foo.meth2(x)
|
|
\end{verbatim}
|
|
Class methods can store instance data by assigning to instance data
|
|
attributes, e.g.:
|
|
\begin{verbatim}
|
|
self.size = 100
|
|
self.title = 'Dear John'
|
|
\end{verbatim}
|
|
Data attributes do not have to be declared; as with local variables,
|
|
they spring into existence when assigned to. It is a matter of
|
|
discretion to avoid name conflicts with method names. This facility
|
|
is also available to class users; instances of a method-less class can
|
|
be used as records with named fields.
|
|
|
|
There is no built-in mechanism for instance initialization. Classes
|
|
by convention provide an {\tt init()} method which initializes the
|
|
instance and then returns it, so the user can write
|
|
\begin{verbatim}
|
|
x = Foo().init('Dr. Strangelove')
|
|
\end{verbatim}
|
|
|
|
Any user-defined class can be used as a base class to derive other
|
|
classes. However, built-in types like lists cannot be used as base
|
|
classes. (Incidentally, the same is true in \Cpp{} and Modula-3.) A
|
|
class may override any method of its base classes. Instance methods
|
|
are first searched in the method list of their class, and then,
|
|
recursively, in the method lists of their base class. Initialization
|
|
methods of derived classes should explicitly call the initialization
|
|
methods of their base class.
|
|
|
|
A simple form of multiple inheritance is also supported: a class can
|
|
have multiple base classes, but the language rules for resolving name
|
|
conflicts are somewhat simplistic, and consequently the feature has so
|
|
far found little usage.
|
|
|
|
\subsection{The Python Library}
|
|
|
|
Python comes with an extensive library, structured as a collection of
|
|
modules. A few modules are built into the interpreter: these
|
|
generally provide access to system libraries implemented in C such as
|
|
mathematical functions or operating system calls. Two built-in
|
|
modules provide access to internals of the interpreter and its
|
|
environment. Even abusing these internals will at most cause an
|
|
exception in the Python program; the interpreter will not dump core
|
|
because of errors in Python code.
|
|
|
|
Most modules however are written in Python and distributed with the
|
|
interpreter; they provide general programming tools like string
|
|
operations and random number generators, provide more convenient
|
|
interfaces to some built-in modules, or provide specialized services
|
|
like a {\em getopt}-style command line option processor for
|
|
stand-alone scripts.
|
|
|
|
There are also some modules written in Python that dig deep in the
|
|
internals of the interpreter; there is a module to browse the stack
|
|
backtrace when an unhandled exception has occurred, one to disassemble
|
|
the internal representation of Python code, and even an interactive
|
|
source code debugger which can trace Python code, set breakpoints,
|
|
etc.
|
|
|
|
\subsection{Extensibility}
|
|
|
|
It is easy to add new built-in modules written in C to the Python
|
|
interpreter. Extensions appear to the Python user as built-in
|
|
modules. Using a built-in module is no different from using a module
|
|
written in Python, but obviously the author of a built-in module can
|
|
do things that cannot be implemented purely in Python.
|
|
|
|
In particular, built-in modules can contain Python-callable functions
|
|
that call functions from particular system libraries (`wrapper
|
|
functions'), and they can define new object types. In general, if a
|
|
built-in module defines a new object type, it should also provide at
|
|
least one function that creates such objects. Attributes of such
|
|
object types are also implemented in C; they can return data
|
|
associated with the object or methods, implemented as C functions.
|
|
|
|
For instance, an extension was created for Amoeba: it provides wrapper
|
|
functions for the basic Amoeba name server functions, and defines a
|
|
`capability' object type, whose methods are file server operations.
|
|
Another extension is a built-in module called {\tt posix}; it provides
|
|
wrappers around post UNIX system calls. Extension modules also
|
|
provide access to two different windowing/graphics interfaces: STDWIN
|
|
\cite{STDWIN}
|
|
(which connects to X11 on UNIX and to the Mac Toolbox on the
|
|
Macintosh), and the Graphics Library (GL) for Silicon Graphics
|
|
machines.
|
|
|
|
Any function in an extension module is supposed to type-check its
|
|
arguments; the interpreter contains a convenience function to
|
|
facilitate extracting C values from arguments and type-checking them
|
|
at the same time. Returning values is also painless, using standard
|
|
functions to create Python objects from C values.
|
|
|
|
On some systems extension modules may be dynamically loaded, thus
|
|
avoiding the need to maintain a private copy of the Python interpreter
|
|
in order to use a private extension.
|
|
|
|
\section{A Short Description of AIL and Amoeba}
|
|
|
|
An RPC stub generator takes an interface description as input. The
|
|
designer of a stub generator has at least two choices for the input
|
|
language: use a suitably restricted version of the target language, or
|
|
design a new language. The first solution was chosen, for instance,
|
|
by the designers of Flume, the stub generator for the Topaz
|
|
distributed operating system built at DEC SRC
|
|
\cite{Flume,Evolving}.
|
|
|
|
Flume's one and only target language is Modula-2+ (the predecessor of
|
|
Modula-3). Modula-2+, like Modula-N for any N, has an interface
|
|
syntax that is well suited as a stub generator input language: an
|
|
interface module declares the functions that are `exported' by a
|
|
module implementation, with their parameter and return types, plus the
|
|
types and constants used for the parameters. Therefore, the input to
|
|
Flume is simply a Modula-2+ interface module. But even in this ideal
|
|
situation, an RPC stub generator needs to know things about functions
|
|
that are not stated explicitly in the interface module: for instance,
|
|
the transfer direction of VAR parameters (IN, OUT or both) is not
|
|
given. Flume solves this and other problems by a mixture of
|
|
directives hidden in comments and a convention for the names of
|
|
objects. Thus, one could say that the designers of Flume really
|
|
created a new language, even though it looks remarkably like their
|
|
target language.
|
|
|
|
\subsection{The AIL Input Language}
|
|
|
|
Amoeba uses C as its primary programming language. C function
|
|
declarations (at least in `Classic' C) don't specify the types of
|
|
the parameters, let alone their transfer direction. Using this as
|
|
input for a stub generator would require almost all information for
|
|
the stub generator to be hidden inside comments, which would require a
|
|
rather contorted scanner. Therefore we decided to design the input
|
|
syntax for Amoeba's stub generator `from scratch'. This gave us the
|
|
liberty to invent proper syntax not only for the transfer direction of
|
|
parameters, but also for variable-length arrays.
|
|
|
|
On the other hand we decided not to abuse our freedom, and borrowed as
|
|
much from C as we could. For instance, AIL runs its input through the
|
|
C preprocessor, so we get macros, include files and conditional
|
|
compilation for free. AIL's type declaration syntax is a superset of
|
|
C's, so the user can include C header files to use the types declared
|
|
there as function parameter types --- which are declared using
|
|
function prototypes as in \Cpp{} or Standard C\@. It should be clear by
|
|
now that AIL's lexical conventions are also identical to C's. The
|
|
same is true for its expression syntax.
|
|
|
|
Where does AIL differ from C, then? Function declarations in AIL are
|
|
grouped in {\em classes}. Classes in AIL are mostly intended as a
|
|
grouping mechanism: all functions implemented by a server are grouped
|
|
together in a class. Inheritance is used to form new groups by adding
|
|
elements to existing groups; multiple inheritance is supported to join
|
|
groups together. Classes can also contain constant and type
|
|
definitions, and one form of output that AIL can generate is a header
|
|
file for use by C programmers who wish to use functions from a
|
|
particular AIL class.
|
|
|
|
Let's have a look at some (unrealistically simple) class definitions:
|
|
\begin{verbatim}
|
|
#include <amoeba.h> /* Defines `capability', etc. */
|
|
|
|
class standard_ops [1000 .. 1999] {
|
|
/* Operations supported by most interfaces */
|
|
std_info(*, out char buf[size:100], out int size);
|
|
std_destroy(*);
|
|
};
|
|
\end{verbatim}
|
|
This defines a class called `standard\_ops' whose request codes are
|
|
chosen by AIL from the range 1000-1999. Request codes are small
|
|
integers used to identify remote operations. The author of the class
|
|
must specify a range from which AIL chooses, and class authors must
|
|
make sure they avoid conflicts, e.g. by using an `assigned number
|
|
administration office'. In the example, `std\_info' will be assigned
|
|
request code 1000 and `std\_destroy' will get code 1001. There is
|
|
also an option to explicitly assign request codes, for compatibility
|
|
with servers with manually written interfaces.
|
|
|
|
The class `standard\_ops' defines two operations, `std\_info' and
|
|
`std\_destroy'. The first parameter of each operation is a star
|
|
(`*'); this is a placeholder for a capability that must be passed when
|
|
the operation is called. The description of Amoeba below explains the
|
|
meaning and usage of capabilities; for now, it is sufficient to know
|
|
that a capability is a small structure that uniquely identifies an
|
|
object and a server or service.
|
|
|
|
The standard operation `std\_info' has two output parameters: a
|
|
variable-size character buffer (which will be filled with a short
|
|
descriptive string of the object to which the operation is applied)
|
|
and an integer giving the length of this string. The standard
|
|
operation `std\_destroy' has no further parameters --- it just
|
|
destroys the object, if the caller has the right to do so.
|
|
|
|
The next class is called `tty':
|
|
\begin{verbatim}
|
|
class tty [2000 .. 2099] {
|
|
inherit standard_ops;
|
|
const TTY_MAXBUF = 1000;
|
|
tty_write(*, char buf[size:TTY_MAXBUF], int size);
|
|
tty_read(*, out char buf[size:TTY_MAXBUF], out int size);
|
|
};
|
|
\end{verbatim}
|
|
The request codes for operations defined in this class lie in the
|
|
range 2000-2099; inherited operations use the request codes already
|
|
assigned to them. The operations defined by this class are
|
|
`tty\_read' and `tty\_write', which pass variable-sized data buffers
|
|
between client and server. Class `tty' inherits class
|
|
`standard\_ops', so tty objects also support the operations
|
|
`std\_info' and `std\_destroy'.
|
|
|
|
Only the {\em interface} for `std\_info' and `std\_destroy' is shared
|
|
between tty objects and other objects whose interface inherits
|
|
`standard\_ops'; the implementation may differ. Even multiple
|
|
implementations of the `tty' interface may exist, e.g. a driver for a
|
|
console terminal and a terminal emulator in a window. To expand on
|
|
the latter example, consider:
|
|
\begin{verbatim}
|
|
class window [2100 .. 2199] {
|
|
inherit standard_ops;
|
|
win_create(*, int x, int y, int width, int height,
|
|
out capability win_cap);
|
|
win_reconfigure(*, int x, int y, int width, int height);
|
|
};
|
|
|
|
class tty_emulator [2200 .. 2299] {
|
|
inherit tty, window;
|
|
};
|
|
\end{verbatim}
|
|
Here two new interface classes are defined.
|
|
Class `window' could be used for creating and manipulating windows.
|
|
Note that `win\_create' returns a capability for the new window.
|
|
This request should probably should be sent to a generic window
|
|
server capability, or it might create a subwindow when applied to a
|
|
window object.
|
|
|
|
Class `tty\_emulator' demonstrates the essence of multiple inheritance.
|
|
It is presumably the interface to a window-based terminal emulator.
|
|
Inheritance is transitive, so `tty\_emulator' also implicitly inherits
|
|
`standard\_ops'.
|
|
In fact, it inherits it twice: once via `tty' and once via `window'.
|
|
Since AIL class inheritance only means interface sharing, not
|
|
implementation sharing, inheriting the same class multiple times is
|
|
never a problem and has the same effect as inheriting it once.
|
|
|
|
Note that the power of AIL classes doesn't go as far as \Cpp{}.
|
|
AIL classes cannot have data members, and there is
|
|
no mechanism for a server that implements a derived class
|
|
to inherit the implementation of the base
|
|
class --- other than copying the source code.
|
|
The syntax for class definitions and inheritance is also different.
|
|
|
|
\subsection{Amoeba}
|
|
|
|
The smell of `object-orientedness' that the use of classes in AIL
|
|
creates matches nicely with Amoeba's object-oriented approach to
|
|
RPC\@. In Amoeba, almost all operating system entities (files,
|
|
directories, processes, devices etc.) are implemented as {\em
|
|
objects}. Objects are managed by {\em services} and represented by
|
|
{\em capabilities}. A capability gives its holder access to the
|
|
object it represents. Capabilities are protected cryptographically
|
|
against forgery and can thus be kept in user space. A capability is a
|
|
128-bit binary string, subdivided as follows:
|
|
|
|
% XXX Need a better version of this picture!
|
|
\begin{verbatim}
|
|
48 24 8 48 Bits
|
|
+----------------+------------+--------+---------------+
|
|
| Service | Object | Perm. | Check |
|
|
| port | number | bits | word |
|
|
+----------------+------------+--------+---------------+
|
|
\end{verbatim}
|
|
|
|
The service port is used by the RPC implementation in the Amoeba
|
|
kernel to locate a server implementing the service that manages the
|
|
object. In many cases there is a one-to-one correspondence between
|
|
servers and services (each service is implemented by exactly one
|
|
server process), but some services are replicated. For instance,
|
|
Amoeba's directory service, which is crucial for gaining access to most
|
|
other services, is implemented by two servers that listen on the same
|
|
port and know about exactly the same objects.
|
|
|
|
The object number in the capability is used by the server receiving
|
|
the request for identifying the object to which the operation applies.
|
|
The permission bits specify which operations the holder of the capability
|
|
may apply. The last part of a capability is a 48-bit long `check
|
|
word', which is used to prevent forgery. The check word is computed
|
|
by the server based upon the permission bits and a random key per object
|
|
that it keeps secret. If you change the permission bits you must compute
|
|
the proper check word or else the server will refuse the capability.
|
|
Due to the size of the check word and the nature of the cryptographic
|
|
`one-way function' used to compute it, inverting this function is
|
|
impractical, so forging capabilities is impossible.%
|
|
\footnote{
|
|
As computers become faster, inverting the one-way function becomes
|
|
less impractical.
|
|
Therefore, a next version of Amoeba will have 64-bit check words.
|
|
}
|
|
|
|
A working Amoeba system is a collection of diverse servers, managing
|
|
files, directories, processes, devices etc. While most servers have
|
|
their own interface, there are some requests that make sense for some
|
|
or all object types. For instance, the {\em std\_info()} request,
|
|
which returns a short descriptive string, applies to all object types.
|
|
Likewise, {\em std\_destroy()} applies to files, directories and
|
|
processes, but not to devices.
|
|
|
|
Similarly, different file server implementations may want to offer the
|
|
same interface for operations like {\em read()} and {\em write()} to
|
|
their clients. AIL's grouping of requests into classes is ideally
|
|
suited to describe this kind of interface sharing, and a class
|
|
hierarchy results which clearly shows the similarities between server
|
|
interfaces (not necessarily their implementations!).
|
|
|
|
The base class of all classes defines the {\em std\_info()} request.
|
|
Most server interfaces actually inherit a derived class that also
|
|
defines {\em std\_destroy().} File servers inherit a class that
|
|
defines the common operations on files, etc.
|
|
|
|
\subsection{How AIL Works}
|
|
|
|
The AIL stub generator functions in three phases:
|
|
\begin{itemize}
|
|
\item
|
|
parsing,
|
|
\item
|
|
strategy determination,
|
|
\item
|
|
code generation.
|
|
\end{itemize}
|
|
|
|
{\bf Phase one} parses the input and builds a symbol table containing
|
|
everything it knows about the classes and other definitions found in
|
|
the input.
|
|
|
|
{\bf Phase two} determines the strategy to use for each function
|
|
declaration in turn and decides upon the request and reply message
|
|
formats. This is not a simple matter, because of various optimization
|
|
attempts. Amoeba's kernel interface for RPC requests takes a
|
|
fixed-size header and one arbitrary-size buffer. A large part of the
|
|
header holds the capability of the object to which the request is
|
|
directed, but there is some space left for a few integer parameters
|
|
whose interpretation is left up to the server. AIL tries to use these
|
|
slots for simple integer parameters, for two reasons.
|
|
|
|
First, unlike the buffer, header fields are byte-swapped by the RPC
|
|
layer in the kernel if necessary, so it saves a few byte swapping
|
|
instructions in the user code. Second, and more important, a common
|
|
form of request transfers a few integers and one large buffer to or
|
|
from a server. The {\em read()} and {\em write()} requests of most
|
|
file servers have this form, for instance. If it is possible to place
|
|
all integer parameters in the header, the address of the buffer
|
|
parameter can be passed directly to the kernel RPC layer. While AIL
|
|
is perfectly capable of handling requests that do not fit this format,
|
|
the resulting code involves allocating a new buffer and copying all
|
|
parameters into it. It is a top priority to avoid this copying
|
|
(`marshalling') if at all possible, in order to maintain Amoeba's
|
|
famous RPC performance.
|
|
|
|
When AIL resorts to copying parameters into a buffer, it reorders them
|
|
so that integers indicating the lengths of variable-size arrays are
|
|
placed in the buffer before the arrays they describe, since otherwise
|
|
decoding the request would be impossible. It also adds occasional
|
|
padding bytes to ensure integers are aligned properly in the buffer ---
|
|
this can speed up (un)marshalling.
|
|
|
|
{\bf Phase three} is the code generator, or back-end. There are in
|
|
fact many different back-ends that may be called in a single run to
|
|
generate different types of output. The most important output types
|
|
are header files (for inclusion by the clients of an interface),
|
|
client stubs, and `server main loop' code. The latter decodes
|
|
incoming requests in the server. The generated code depends on the
|
|
programming language requested, and there are separate back-ends for
|
|
each supported language.
|
|
|
|
It is important that the strategy chosen by phase two is independent
|
|
of the language requested for phase three --- otherwise the
|
|
interoperability of servers and clients written in different languages
|
|
would be compromised.
|
|
|
|
\section{Linking AIL to Python}
|
|
|
|
From the previous section it can be concluded that linking AIL to
|
|
Python is a matter of writing a back-end for Python. This is indeed
|
|
what we did.
|
|
|
|
Considerable time went into the design of the back-end in order to
|
|
make the resulting RPC interface for Python fit as smoothly as
|
|
possible in Python's programming style. For instance, the issues of
|
|
parameter transfer, variable-size arrays, error handling, and call
|
|
syntax were all solved in a manner that favors ease of use in Python
|
|
rather than strict correspondence with the stubs generated for C,
|
|
without compromising network-level compatibility.
|
|
|
|
\subsection{Mapping AIL Entities to Python}
|
|
|
|
For each programming language that AIL is to support, a mapping must
|
|
be designed between the data types in AIL and those in that language.
|
|
Other aspects of the programming languages, such as differences in
|
|
function call semantics, must also be taken care of.
|
|
|
|
While the mapping for C is mostly straightforward, the mapping for
|
|
Python requires a little thinking to get the best results for Python
|
|
programmers.
|
|
|
|
\subsubsection{Parameter Transfer Direction}
|
|
|
|
Perhaps the simplest issue is that of parameter transfer direction.
|
|
Parameters of functions declared in AIL are categorized as being of
|
|
type {\tt in}, {\tt out} or {\tt in} {\tt out} (the same distinction
|
|
as made in Ada). Python only has call-by-value parameter semantics;
|
|
functions can return multiple values as a tuple. This means that,
|
|
unlike the C back-end, the Python back-end cannot always generate
|
|
Python functions with exactly the same parameter list as the AIL
|
|
functions.
|
|
|
|
Instead, the Python parameter list consists of all {\tt in} and {\tt
|
|
in} {\tt out} parameters, in the order in which they occur in the AIL
|
|
parameter list; similarly, the Python function returns a tuple
|
|
containing all {\tt in} {\tt out} and {\tt out} parameters. In fact
|
|
Python packs function parameters into a tuple as well, stressing the
|
|
symmetry between parameters and return value. For example, a stub
|
|
with this AIL parameter list:
|
|
\begin{verbatim}
|
|
(*, in int p1, in out int p2, in int p3, out int p4)
|
|
\end{verbatim}
|
|
will have the following parameter list and return values in Python:
|
|
\begin{verbatim}
|
|
(p1, p2, p3) -> (p2, p4)
|
|
\end{verbatim}
|
|
|
|
\subsubsection{Variable-size Entities}
|
|
|
|
The support for variable-size objects in AIL is strongly guided by the
|
|
limitations of C in this matter. Basically, AIL allows what is
|
|
feasible in C: functions may have variable-size arrays as parameters
|
|
(both input or output), provided their length is passed separately.
|
|
In practice this is narrowed to the following rule: for each
|
|
variable-size array parameter, there must be an integer parameter
|
|
giving its length. (An exception for null-terminated strings is
|
|
planned but not yet realized.)
|
|
|
|
Variable-size arrays in AIL or C correspond to {\em sequences} in
|
|
Python: lists, tuples or strings. These are much easier to use than
|
|
their C counterparts. Given a sequence object in Python, it is always
|
|
possible to determine its size: the built-in function {\tt len()}
|
|
returns it. It would be annoying to require the caller of an RPC stub
|
|
with a variable-size parameter to also pass a parameter that
|
|
explicitly gives its size. Therefore we eliminate all parameters from
|
|
the Python parameter list whose value is used as the size of a
|
|
variable-size array. Such parameters are easily found: the array
|
|
bound expression contains the name of the parameter giving its size.
|
|
This requires the stub code to work harder (it has to recover the
|
|
value for size parameters from the corresponding sequence parameter),
|
|
but at least part of this work would otherwise be needed as well, to
|
|
check that the given and actual sizes match.
|
|
|
|
Because of the symmetry in Python between the parameter list and the
|
|
return value of a function, the same elimination is performed on
|
|
return values containing variable-size arrays: integers returned
|
|
solely to tell the client the size of a returned array are not
|
|
returned explicitly to the caller in Python.
|
|
|
|
\subsubsection{Error Handling}
|
|
|
|
Another point where Python is really better than C is the issue of
|
|
error handling. It is a fact of life that everything involving RPC
|
|
may fail, for a variety of reasons outside the user's control: the
|
|
network may be disconnected, the server may be down, etc. Clients
|
|
must be prepared to handle such failures and recover from them, or at
|
|
least print an error message and die. In C this means that every
|
|
function returns an error status that must be checked by the caller,
|
|
causing programs to be cluttered with error checks --- or worse,
|
|
programs that ignore errors and carry on working with garbage data.
|
|
|
|
In Python, errors are generally indicated by exceptions, which can be
|
|
handled out of line from the main flow of control if necessary, and
|
|
cause immediate program termination (with a stack trace) if ignored.
|
|
To profit from this feature, all RPC errors that may be encountered by
|
|
AIL-generated stubs in Python are turned into exceptions. An extra
|
|
value passed together with the exception is used to relay the error
|
|
code returned by the server to the handler. Since in general RPC
|
|
failures are rare, Python test programs can usually ignore exceptions
|
|
--- making the program simpler --- without the risk of occasional
|
|
errors going undetected. (I still remember the embarrassment of a
|
|
hundredfold speed improvement reported, long, long, ago, about a new
|
|
version of a certain program, which later had to be attributed to a
|
|
benchmark that silently dumped core...)
|
|
|
|
\subsubsection{Function Call Syntax}
|
|
|
|
Amoeba RPC operations always need a capability parameter (this is what
|
|
the `*' in the AIL function templates stands for); the service is
|
|
identified by the port field of the capability. In C, the capability
|
|
must always be the first parameter of the stub function, but in Python
|
|
we can do better.
|
|
|
|
A Python capability is an opaque object type in its own right, which
|
|
is used, for instance, as parameter to and return value from Amoeba's
|
|
name server functions. Python objects can have methods, so it is
|
|
convenient to make all AIL-generated stubs methods of capabilities
|
|
instead of just functions. Therefore, instead of writing
|
|
\begin{verbatim}
|
|
some_stub(cap, other_parameters)
|
|
\end{verbatim}
|
|
as in C, Python programmers can write
|
|
\begin{verbatim}
|
|
cap.some_stub(other_parameters)
|
|
\end{verbatim}
|
|
This is better because it reduces name conflicts: in Python, no
|
|
confusion is possible between a stub and a local or global variable or
|
|
user-defined function with the same name.
|
|
|
|
\subsubsection{Example}
|
|
|
|
All the preceding principles can be seen at work in the following
|
|
example. Suppose a function is declared in AIL as follows:
|
|
\begin{verbatim}
|
|
some_stub(*, in char buf[size:1000], in int size,
|
|
out int n_done, out int status);
|
|
\end{verbatim}
|
|
In C it might be called by the following code (including declarations,
|
|
for clarity, but not initializations):
|
|
\begin{verbatim}
|
|
int err, n_done, status;
|
|
capability cap;
|
|
char buf[500];
|
|
...
|
|
err = some_stub(&cap, buf, sizeof buf, &n_done, &status);
|
|
if (err != 0) return err;
|
|
printf("%d done; status = %d\n", n_done, status);
|
|
\end{verbatim}
|
|
Equivalent code in Python might be the following:
|
|
\begin{verbatim}
|
|
cap = ...
|
|
buf = ...
|
|
n_done, status = cap.some_stub(buf)
|
|
print n_done, 'done;', 'status =', status
|
|
\end{verbatim}
|
|
No explicit error check is required in Python: if the RPC fails, an
|
|
exception is raised so the {\tt print} statement is never reached.
|
|
|
|
\subsection{The Implementation}
|
|
|
|
More or less orthogonal to the issue of how to map AIL operations to
|
|
the Python language is the question of how they should be implemented.
|
|
|
|
In principle it would be possible to use the same strategy that is
|
|
used for C: add an interface to Amoeba's low-level RPC primitives to
|
|
Python and generate Python code to marshal parameters into and out of
|
|
a buffer. However, Python's high-level data types are not well suited
|
|
for marshalling: byte-level operations are clumsy and expensive, with
|
|
the result that marshalling a single byte of data can take several
|
|
Python statements. This would mean that a large amount of code would
|
|
be needed to implement a stub, which would cost a lot of time to parse
|
|
and take up a lot of space in `compiled' form (as parse tree or pseudo
|
|
code). Execution of the marshalling code would be sluggish as well.
|
|
|
|
We therefore chose an alternate approach, writing the marshalling in
|
|
C, which is efficient at such byte-level operations. While it is easy
|
|
enough to generate C code that can be linked with the Python
|
|
interpreter, it would obviously not stimulate the use of Python for
|
|
server testing if each change to an interface required relinking the
|
|
interpreter (dynamic loading of C code is not yet available on
|
|
Amoeba). This is circumvented by the following solution: the
|
|
marshalling is handled by a simple {\em virtual machine}, and AIL
|
|
generates instructions for this machine. An interpreter for the
|
|
machine is linked into the Python interpreter and reads its
|
|
instructions from a file written by AIL.
|
|
|
|
The machine language for our virtual machine is dubbed {\em Stubcode}.
|
|
Stubcode is a super-specialized language. There are two sets of of
|
|
about a dozen instructions each: one set marshals Python objects
|
|
representing parameters into a buffer, the other set (similar but not
|
|
quite symmetric) unmarshals results from a buffer into Python objects.
|
|
The Stubcode interpreter uses a stack to hold Python intermediate
|
|
results. Other state elements are an Amoeba header and buffer, a
|
|
pointer indicating the current position in the buffer, and of course a
|
|
program counter. Besides (un)marshalling, the virtual machine must
|
|
also implement type checking, and raise a Python exception when a
|
|
parameter does not have the expected type.
|
|
|
|
The Stubcode interpreter marshals Python data types very efficiently,
|
|
since each instruction can marshal a large amount of data. For
|
|
instance, a whole Python string is marshalled by a single Stubcode
|
|
instruction, which (after some checking) executes the most efficient
|
|
byte-copying loop possible --- it calls {\tt memcpy()}.
|
|
|
|
|
|
Construction details of the Stubcode interpreter are straightforward.
|
|
Most complications are caused by the peculiarities of AIL's strategy
|
|
module and Python's type system. By far the most complex single
|
|
instruction is the `loop' instruction, which is used to marshal
|
|
arrays.
|
|
|
|
As an example, here is the complete Stubcode program (with spaces and
|
|
comments added for clarity) generated for the function {\tt
|
|
some\_stub()} of the example above. The stack contains pointers to
|
|
Python objects, and its initial contents is the parameter to the
|
|
function, the string {\tt buf}. The final stack contents will be the
|
|
function return value, the tuple {\tt (n\_done, status)}. The name
|
|
{\tt header} refers to the fixed size Amoeba RPC header structure.
|
|
\vspace{1em}
|
|
|
|
{\tt
|
|
\begin{tabular}{l l l}
|
|
BufSize & 1000 & {\em Allocate RPC buffer of 1000 bytes} \\
|
|
Dup & 1 & {\em Duplicate stack top} \\
|
|
StringS & & {\em Replace stack top by its string size} \\
|
|
PutI & h\_extra int32 & {\em Store top element in }header.h\_extra \\
|
|
TStringSlt & 1000 & {\em Assert string size less than 1000} \\
|
|
PutVS & & {\em Marshal variable-size string} \\
|
|
& & \\
|
|
Trans & 1234 & {\em Execute the RPC (request code 1234)} \\
|
|
& & \\
|
|
GetI & h\_extra int32 & {\em Push integer from} header.h\_extra \\
|
|
GetI & h\_size int32 & {\em Push integer from} header.h\_size \\
|
|
Pack & 2 & {\em Pack top 2 elements into a tuple} \\
|
|
\end{tabular}
|
|
}
|
|
\vspace{1em}
|
|
|
|
As much work as possible is done by the Python back-end in AIL, rather
|
|
than in the Stubcode interpreter, to make the latter both simple and
|
|
fast. For instance, the decision to eliminate an array size parameter
|
|
from the Python parameter list is taken by AIL, and Stubcode
|
|
instructions are generated to recover the size from the actual
|
|
parameter and to marshal it properly. Similarly, there is a special
|
|
alignment instruction (not used in the example) to meet alignment
|
|
requirements.
|
|
|
|
Communication between AIL and the Stubcode generator is via the file
|
|
system. For each stub function, AIL creates a file in its output
|
|
directory, named after the stub with a specific suffix. This file
|
|
contains a machine-readable version of the Stubcode program for the
|
|
stub. The Python user can specify a search path containing
|
|
directories which the interpreter searches for a Stubcode file the
|
|
first time the definition for a particular stub is needed.
|
|
|
|
The transformations on the parameter list and data types needed to map
|
|
AIL data types to Python data types make it necessary to help the
|
|
Python programmer a bit in figuring out the parameters to a call.
|
|
Although in most cases the rules are simple enough, it is sometimes
|
|
hard to figure out exactly what the parameter and return values of a
|
|
particular stub are. There are two sources of help in this case:
|
|
first, the exception contains enough information so that the user can
|
|
figure what type was expected; second, AIL's Python back-end
|
|
optionally generates a human-readable `interface specification' file.
|
|
|
|
\section{Conclusion}
|
|
|
|
We have succeeded in creating a useful extension to Python that
|
|
enables Amoeba server writers to test and experiment with their server
|
|
in a much more interactive manner. We hope that this facility will
|
|
add to the popularity of AIL amongst Amoeba programmers.
|
|
|
|
Python's extensibility was proven convincingly by the exercise
|
|
(performed by the second author) of adding the Stubcode interpreter to
|
|
Python. Standard data abstraction techniques are used to insulate
|
|
extension modules from details of the rest of the Python interpreter.
|
|
In the case of the Stubcode interpreter this worked well enough that
|
|
it survived a major overhaul of the main Python interpreter virtually
|
|
unchanged.
|
|
|
|
On the other hand, adding a new back-end to AIL turned out to be quite
|
|
a bit of work. One problem, specific to Python, was to be expected:
|
|
Python's variable-size data types differ considerably from the
|
|
C-derived data model that AIL favors. Two additional problems we
|
|
encountered were the complexity of the interface between AIL's second
|
|
and third phases, and a number of remaining bugs in the second phase
|
|
that surfaced when the implementation of the Python back-end was
|
|
tested. The bugs have been tracked down and fixed, but nothing
|
|
has been done about the complexity of the interface.
|
|
|
|
\subsection{Future Plans}
|
|
|
|
AIL's C back-end generates server main loop code as well as client
|
|
stubs. The Python back-end currently only generates client stubs, so
|
|
it is not yet possible to write servers in Python. While it is
|
|
clearly more important to be able to use Python as a client than as a
|
|
server, the ability to write server prototypes in Python would be a
|
|
valuable addition: it allows server designers to experiment with
|
|
interfaces in a much earlier stage of the design, with a much smaller
|
|
programming effort. This makes it possible to concentrate on concepts
|
|
first, before worrying about efficient implementation.
|
|
|
|
The unmarshalling done in the server is almost symmetric with the
|
|
marshalling in the client, and vice versa, so relative small
|
|
extensions to the Stubcode virtual machine will allow its use in a
|
|
server main loop. We hope to find the time to add this feature to a
|
|
future version of Python.
|
|
|
|
\section{Availability}
|
|
|
|
The Python source distribution is available to Internet users by
|
|
anonymous ftp to site {\tt ftp.cwi.nl} [IP address 192.16.184.180]
|
|
from directory {\tt /pub}, file name {\tt python*.tar.Z} (where the
|
|
{\tt *} stands for a version number). This is a compressed UNIX tar
|
|
file containing the C source and \LaTeX documentation for the Python
|
|
interpreter. It includes the Python library modules and the {\em
|
|
Stubcode} interpreter, as well as many example Python programs. Total
|
|
disk space occupied by the distribution is about 3 Mb; compilation
|
|
requires 1-3 Mb depending on the configuration built, the compile
|
|
options, etc.
|
|
|
|
\bibliographystyle{plain}
|
|
|
|
\bibliography{quabib}
|
|
|
|
\end{document}
|