Object Detection Grammars

Pedro F. Felzenszwalb; David McAllester. 11 February, 2010.
Communicated by Pedro Felzenszwalb.


We formulate a general grammar model motivated by the problem of object detection in computer vision. We focus on four aspects of modeling objects for the purpose of object detection. First, we are interested in modeling objects as having parts which are themselves (recursively) objects. For example a person can be represented as being composed of a face, a trunk, arms, and legs where a face is composed of eyes, a nose and a mouth. Second, we are interested modeling object (and part) categories as being composed of subcategories or subtypes. For example we might distinguish sitting people from standing people and smiling faces from frowning faces. Third, we are interested in modeling the relative positions of the parts that make up an object. For example, in a person, the position of the hand is related to the position of the lower arm which is related to the position of the upper arm which is related to the position of the torso. Fourth, we are interested in modeling the appearance of objects so that we can find them in images. For example, a pattern of edges in a particular location of an image might give evidence for, or against, the presence of a part at that location. These four aspects of models --- parts, subtypes, positions, and appearance --- can be represented in a single grammar formalism that we call an object detection grammar.

Original Document

The original document is available in PDF (uploaded 11 February, 2010 by Pedro Felzenszwalb).