TR-2010-02
Object Detection Grammars
Pedro F. Felzenszwalb; David McAllester. 11 February, 2010.
Communicated by Pedro Felzenszwalb.
Abstract
We formulate a general grammar model motivated by the problem of
object detection in computer vision. We focus on four aspects of
modeling objects for the purpose of object detection. First, we are
interested in modeling objects as having parts which are themselves
(recursively) objects. For example a person can be represented as
being composed of a face, a trunk, arms, and legs where a face is
composed of eyes, a nose and a mouth. Second, we are interested
modeling object (and part) categories as being composed of
subcategories or subtypes. For example we might distinguish sitting
people from standing people and smiling faces from frowning faces.
Third, we are interested in modeling the relative positions of the
parts that make up an object. For example, in a person, the position
of the hand is related to the position of the lower arm which is
related to the position of the upper arm which is related to the
position of the torso. Fourth, we are interested in modeling the
appearance of objects so that we can find them in images. For
example, a pattern of edges in a particular location of an image might
give evidence for, or against, the presence of a part at that
location. These four aspects of models --- parts, subtypes,
positions, and appearance --- can be represented in a single grammar
formalism that we call an object detection grammar.
Original Document
The original document is available in PDF (uploaded 11 February, 2010 by
Pedro Felzenszwalb).