Infrastructure Perspective: Forest Controls

Overview

A
Forest Control is used to display hierarchical data which is stored in the database in a flat way. This uses static strings to store the actual data. So you might want to read up on that first, before you delve into this.

Consider the problem of listing all existing software. Softwares naturally falls into different categories. A simple example follows:

+ Linux
   - Administrative
   - Databases
   - Publishing
       o Latex
       o Star Office
   - Web
       o Netscape
       o Mozzilla
   - Email
+ FreeBSD
+ Windows
+ VMWare
+ DOS

and so on. This can be conceptually visualised as a tree (a forest is by definition a bunch of unrelated trees), or as representing some kind of a file system. Each node (inode if you think interms of a file system) stores pointers which help us find the actual information. The actual information for each node is stored in static strings, and pointers to them (their s_str_id's) are stored with the node. This kind of hierarchical data cannot be stored in a relational database in a natural format. So we flatten this tree into a list and store it in the database. If you are thinking in terms of a filesystem, we are storing the the full path name of all the files in the filesystem in the database. This flattened information is enough for us to recontruct the tree (i.e. given the full path names of all the files, we can reconstruct the directory structure). The Forest Control implements this unflattening algorithm and also helps render this information in the form of a web page.

The faqs table

T
he flattened information is stored in the table faqs. The structure is as follows:

| Field       | Type         | 
+-------------+--------------+
| doc_node_id | int(11)      |
| path_desc   | varchar(255) |
| keywords    | text         | 
| long_num    | int(11)      |
| short_num   | int(11)      |
| desc_text   | text         |
| private     | int(11)      |

The doc_node_id field is the primary key which is there for purely technical reasons. The path_desc field contains the full path name of the node. The keywords field contains the keywords for this node. This is to facilitate searching. The desc_text field contains a small description about the contents of the node. The private field indicates whether this node should be available for public viewing or for techstaff viewing only. The actual data of the node is stored in two static strings, whose id's are short_num and long_num respectively. Except for the path_desc, all other information is optional.

The exact use the short_num and the long_num fields are put to may vary. In our example of software listing, the static string keyed under short_num may contain a short description of the data (more detail then desc_text), and long_num contains the full description. If we are using this to represent a FAQ, then short_num's static string could represent the question the FAQ is answering and long_num the answer to the question. The contents of the table to represent the software listing shown above is:

(path_desc),                  (desc_text)
/linux,                       Linux
/windows,                     Windows
/vmware,                      VMWare
/dos,                         DOS
/freebsd,                     FreeBSD
/linux/adm,                   Administrative
/linux/web,                   Web
/linux/email,                 Email
/linux/publish,               Publishing
/linux/publish/latex,         LaTeX
/linux/web/mozilla,           Mozzilla
........

How does the Control work?

E
very control has a output function which is called which returns a representation of the HTML to be displayed (i.e. raw HTML strings, or OUTPUT objects). When the output(self,display,container) method is called it does the following

  1. Calls display.fetch_records(). It returns a list of dictionaries . Usually fetch_records executes an sql query, modifies each dictionary (if need be), and returns the list of dictionaries.
  2. Unflattens this and constructs a tree representation of the data. Each node has all the information which was originally there.
  3. Creates an instance of TheNodeControl (some descendent of NodeControl), which was passed in as a argument of __init__ of this control. NodeControls can have children which are controls. So this is used to create a tree of NodeControls, where each node in the tree is instantiated with the corresponding data.
  4. Finally it calls the output method of the top level instance of TheNodeControl. Each instance of NodeControl is responsible for making sure that its children get called (if need be).

What does the NodeControl do?

W
hen a NodeControl is created it is already populated with the data in self.node['data'] and additional information (like which level of the tree, this node resides...) in self.node['stat']. It also is populated with its children in self.subcontrols. The output function is defined in NodeControl, and should never be overriden. The node specific output is done by the function node_output. The code for output does the following:

  1. Call self.allow_output to see if this control should output anything at all.
  2. If so, it calls node_output node_output should return the container into which all its children should output. It can be the container which node_output got, a new container (which should have been inserted in the original one) or None.
  3. If the return value of node_output (or the original container if allow_output returned false) is not None, then the output functions of the children are called with this container as the original container.
  4. Otherwise, the children dont get to output anything at all.

Hence one call to the output method of the top level NodeControl ends up calling the output methods of the entire tree. All the creativity for a particular page goes into designing the node_output functions.

What is a CaseNodeControl object?

O
ften (meaning almost always) you will want the node_output function to do something based on the level at which the current node is. E.g. all level 1 nodes display their long_num info, level 2 nodes display short_num info and level 3 display desc_text or something like that. So instead of having a complicated node_output function which has to handle all situations, we have this CaseNodeControl object which is a NodeControl object, but masquerades as a different NodeControl object depending on the level of the current node. So a declaration such as

class MyNodeControl(CaseNodeControl,
                    faqcontrols.TextOnlyNodeControl,
                    faqcontrols.TextULNodeControl):
  """This is the node control which masquerades as a
  TextULNodeControl at level 1, and as a TextOnlyNodeControl at level 2.
  All the masquerading code is in CaseNodeControl. You have to inherit
  from the other controls for typecorrectness."""

DictClasses = { 1 : faqcontrols.TextULNodeControl, 2 : faqcontrols.TextOnlyNodeControl }

creates a NodeControl object which when displaying nodes at level 1, does what TextULNodeControl does, at level 2 does what TextOnlyNodeControl does and at all other levels does not display anything.

What can I do in fetch_records?

I
n all pages, one would not want to display the entire tree. For e.g. in the Linux page, one does not want the Windows,VMWare,FreeBSD links to show up. What we want to do is to be able to prune the tree and get the subtree rooted at linux. This is easily accomplished in fetch_records. Just select all the records from the table whose absolute path starts with "/linux/". The additional "/" makes sure you dont get "/linuxdoc" or some such thing. Optionally you also might want to get the record for "/linux". This still does not accomplish our task, the forest control will consider /linux to be at level 2, instead of being the root. This is where post processing in fetch_records is useful. Since forestcontrol, takes the field which contains the absolute path name as an argument, we can create a new field, say "node_path" which is just all the absolute path names, but with the leading "/linux" stripped of. With this we have the effect of considering only the subtree rooted under linux. Dont change the original field "path_desc", because that will be needed to generate hyper links.

There can be more than one forestcontrol in a page (I dont have an actual instance of such a page), each one of them calls the same fetch_records function which is part of your display class. In order to help distinguish, fetch_records is passed in a parameter which the name of the field which has the path information. So in the previous example, fetch_records is passed the value "node_path" as an argument. YOu can use this to distinguish different controls. I dont expect the need to have multiple forestcontrols in one page anytime soon though.

A working example

F
or a working example see www/tsdocs. It displays the entire subtree rooted at /tsdocs