Infrastructure Perspective: mod_python Handlers

Introduction

T
his documentation path follows the processing of a request through the mod_python handlers. It should be readable without a detailed understanding of Apache handlers and mod_python, but a careful reading of the mod_python documentation is a prerequisite for making modifications to the handlers. Don't worry -- it's actually simpler than CGI or PHP.

Apache Handlers

A
pache handles HTTP requests in stages. The functions responsible for each stage are called handlers. Apache modules can register handlers for various processing stages.

When a handler is invoked, it has a single argument: the request object. This object contains all information about the current request, such as the method, URI, headers, hostname, etc. In some stages, the handler is expected to fill in fields in the request object. For example, one processing stage involves the translation of the URI from the request (e.g., /docs/misc/API.html) into a filename (e.g., /usr/www/docs/docs/misc/API.html). The handler at this stage is responsible for calculating the filename and inserting it into the correct field in the request object.

A handler can have one of three outcomes. It can:

  • handle the stage,
  • decline to handle the stage, or
  • signal an error condition.
When a handler handles its stage, processing proceeds immediately to the next stage. If a handler declines to handle a stage, Apache finds another handler for the same stage. Apache's built-in handlers are the handlers of last resort, and will not decline. A handler may also signal an error condition (such as 404 Not Found or 500 Internal Server Error), in which case Apache begins construction of an error page (and may invoke more handlers in the process).

When thinking about handlers, keep in mind that Apache is a preforked server, meaning that there are many independent Apache processes running at any time, and an incoming request could go to any one of those processes. So two requests, even if they are from the same user, may go to two entirely different processes. This makes it impossible to store any information about a user within a handler. In this site, all such information is stored in the database, which is shared among all Apache processes.

More information on Apache handlers is available in the Apache API notes. The mod_python documentation also has a nice overview of Apache handlers.

mod_python Handlers

mod_python is an Apache module which allows us to write Apache handlers in Python. The documentation for mod_python is excellent, and gives a flavor for the many possibilities with this excellent module.

The CS site currently implements handlers for three stages: the Init handler handles initialization of the infrastructure, the Translate handler handles translation from the request URI to the filename of a Python script, as well as preliminary decoding of other information in the request, and the Dispatch handler handles the invocation of the Python script which will produce the response.

The Init Handler

T
he first handler invoked (as PythonInitHandler) is the Init handler, found in handlers/init.py (note on filenames). This handler is the first one called for a request, and is responsible for getting the rest of the Python infrastructure into shape. In particular, it initializes:

The Translate Handler

T
he translate handler is invoked as PythonTransHandler to translate the URI from the HTTP request into the filename of a Python script. It does that, and a lot more.

First, the handler checks the PythonDontHandle Apache configuration directive to see if this is a request that the Python site should decline.

If it's not, it does the following:

  • Instantiates an object of the URL class, which will represent the current URL. The constructor for this class breaks down the URL as described in Infrastructure Perspective: HTTP Transaction. The constructed object is stored in req.url.
  • Sets req.secure to true iff this is an SSL request.
  • Instantiates an object of the URLs class, which is used to create new URLs through percent substitution. The constructed object is stored in req.urls.
  • Copies the filename of the Python script, as well as the module path (the .-separated package path for use in a Python import statement) from req.url to req, where Apache expects a filename back from the handler.
  • Parses any cookies using the utils.cookies module, which uses the Python standard library module Cookie, and places the result in req.cookies.
  • Checks for a site login using the utils.session module. If a user is logged in, req.login is her username; otherwise, req.login is None.
  • Logs the request using the debugging module.
  • Instructs Apache to invoke the Dispatch handler when the time comes to send a document back to the user.

If a Python exception occurs during any of this processing, the handler triggers a 404 Not Found error, which does not give the user much information about exactly what went wrong. The entire Python exception, however, is sent to the debugging log.

The Dispatch Handler

T
he Dispatch handler, which lives in handlers/dispatch.py, is responsible for invoking the chosen Python script, and getting its output sent back to the browser. To begin, it tries to import the module that was chosen by the Translate handler, using req.modpath. The Python import mechanism is complex beyond belief; if you'd like to explore its inner workings, I suggest reading the C code in the Python interpreter. Or just assume it works and move on.

If an exception occurs during the import, Dispatch sends the text of the exception to the debugging output, and signals Apache to terminate this Apache process. Dispatch does this because a botched import can sometimes leave the Python interpreter in an unstable state. The Python interpreter terminates with the Apache process, and Apache creates a new process and embedded interpreter to take its place. Dispatch finishes handling the exception by sending a 404 Not Found error back to the browser.

If the import succeeded, Dispatch invokes the module's new function, passing it the request object. This function is to return an object which will actually handle the request.

Next, Dispatch examines the URL to determine the type of request it is to handle. There are two types: actions and regular pages. The utils.actions module identifies actions.

If the request is for an action, then Dispatch calls the display object's perform_action() method, which should execute the action and bounce to a new URL.

If the request is for a regular page, then Dispatch calls the display object's check_security() method, which can be used to make sure a page is always in SSL mode, etc. Then it constructs a new CSPage output object and passes it as an argument to the display object's make_page(page) method. Finally, it renders the page into HTML and sends it off to the browser.

Several types of exceptions can occur: a utils.web_exc.WebError is a short message to inform the user of some problem with their request. For example, if a user somehow tries to perform an action for which she does not have permission, the display object raises a WebError with the message "Permission Denied".

A utils.web_exc.SendLiteral exception directs Dispatch to send the given data with the given content-type directly to the browser without any interpretation. This exception can be used, for example, to produce plain-text output, or to dynamically generate images.

Any other type of exception is handled as an error. Dispatch produces 500 Internal Server Error page, and sends the Python traceback to the debugging log.