Infrastructure Perspective: mod_python Handlers
Introduction
T |
Apache Handlers
A |
When a handler is invoked, it has a single argument: the request object. This object contains all information about the current request, such as the method, URI, headers, hostname, etc. In some stages, the handler is expected to fill in fields in the request object. For example, one processing stage involves the translation of the URI from the request (e.g., /docs/misc/API.html) into a filename (e.g., /usr/www/docs/docs/misc/API.html). The handler at this stage is responsible for calculating the filename and inserting it into the correct field in the request object.
A handler can have one of three outcomes. It can:
- handle the stage,
- decline to handle the stage, or
- signal an error condition.
When thinking about handlers, keep in mind that Apache is a preforked server, meaning that there are many independent Apache processes running at any time, and an incoming request could go to any one of those processes. So two requests, even if they are from the same user, may go to two entirely different processes. This makes it impossible to store any information about a user within a handler. In this site, all such information is stored in the database, which is shared among all Apache processes.
More information on Apache handlers is available in the Apache API notes. The mod_python documentation also has a nice overview of Apache handlers.
mod_python Handlers
mod_python is an Apache module which allows us to write Apache handlers in Python. The documentation for mod_python is excellent, and gives a flavor for the many possibilities with this excellent module.The CS site currently implements handlers for three stages: the Init handler handles initialization of the infrastructure, the Translate handler handles translation from the request URI to the filename of a Python script, as well as preliminary decoding of other information in the request, and the Dispatch handler handles the invocation of the Python script which will produce the response.The Init Handler
T |
- the site pathname configuration (config.py),
- debugging (debug.py),
- the reloader (reloader.py), and
- the SQL query interface (queries.py).
The Translate Handler
T |
First, the handler checks the PythonDontHandle Apache configuration directive to see if this is a request that the Python site should decline.
If it's not, it does the following:
- Instantiates an object of the URL class, which will represent the current URL. The constructor for this class breaks down the URL as described in Infrastructure Perspective: HTTP Transaction. The constructed object is stored in req.url.
- Sets req.secure to true iff this is an SSL request.
- Instantiates an object of the URLs class, which is used to create new URLs through percent substitution. The constructed object is stored in req.urls.
- Copies the filename of the Python script, as well as the module path (the .-separated package path for use in a Python import statement) from req.url to req, where Apache expects a filename back from the handler.
- Parses any cookies using the utils.cookies module, which uses the Python standard library module Cookie, and places the result in req.cookies.
- Checks for a site login using the utils.session module. If a user is logged in, req.login is her username; otherwise, req.login is None.
- Logs the request using the debugging module.
- Instructs Apache to invoke the Dispatch handler when the time comes to send a document back to the user.
If a Python exception occurs during any of this processing, the handler triggers a 404 Not Found error, which does not give the user much information about exactly what went wrong. The entire Python exception, however, is sent to the debugging log.
The Dispatch Handler
T |
If an exception occurs during the import, Dispatch sends the text of the exception to the debugging output, and signals Apache to terminate this Apache process. Dispatch does this because a botched import can sometimes leave the Python interpreter in an unstable state. The Python interpreter terminates with the Apache process, and Apache creates a new process and embedded interpreter to take its place. Dispatch finishes handling the exception by sending a 404 Not Found error back to the browser.
If the import succeeded, Dispatch invokes the module's new function, passing it the request object. This function is to return an object which will actually handle the request.
Next, Dispatch examines the URL to determine the type of request it is to handle. There are two types: actions and regular pages. The utils.actions module identifies actions.
If the request is for an action, then Dispatch calls the display object's perform_action() method, which should execute the action and bounce to a new URL.
If the request is for a regular page, then Dispatch calls the display object's check_security() method, which can be used to make sure a page is always in SSL mode, etc. Then it constructs a new CSPage output object and passes it as an argument to the display object's make_page(page) method. Finally, it renders the page into HTML and sends it off to the browser.
Several types of exceptions can occur: a utils.web_exc.WebError is a short message to inform the user of some problem with their request. For example, if a user somehow tries to perform an action for which she does not have permission, the display object raises a WebError with the message "Permission Denied".
A utils.web_exc.SendLiteral exception directs Dispatch to send the given data with the given content-type directly to the browser without any interpretation. This exception can be used, for example, to produce plain-text output, or to dynamically generate images.
Any other type of exception is handled as an error. Dispatch produces 500 Internal Server Error page, and sends the Python traceback to the debugging log.