TR-2004-11

Graph Model Selection using the Minimum Description Length Principle

Ivona Bezakova; Adam Kalai; Rahul Santhanam. 14 December, 2004.
Communicated by Lance Fortnow.

Abstract

In recent years, there has been a proliferation of theoretical graph models, e.g., preferential attachment, motivated by real-world graphs such as the Web or Internet topology. Typically these models are designed to mimic particular properties observed in the graphs, such as power-law degree distribution or the small-world phenomenon. The mainstream approach to comparing models for these graphs has been somewhat subjective and very application dependent - comparisons are often based on ad hoc graph properties.

We use the Minimum Description Length principle to compare graph models: models are scored based on the degree of compression that they achieve on real data. This criterion is popular across fields for other types of model selection because it is objective and not application specific. Unfortunately, computing such scores is usually a daunting algorithmic task.

We design and implement efficient algorithms for computing the description length for four natural models: a power-law random graph model, a preferential attachment model, a small-world model, and a uniform random graph model. Based on experiments on the Internet topology graph, we find that the preferential attachment model ranks highest, while as we might expect, the uniform random graph model performs the worst.

The contributions of this paper are: (1) proposing description length as an information-theoretically sound metric for ranking graph models, and (2) designing efficient algorithms for four graph models and applying them to three snapshots of the Internet topology graph.

Original Document

The original document is available in PDF (uploaded 14 December, 2004 by Lance Fortnow).