TR-2002-12
Semi-supervised learning on manifolds
Mikhail Belkin; Partha Niyogi. 29 November, 2002.
Communicated by Partha Niyogi.
Abstract
We consider the general problem of utilizing both labeled and
unlabeled data to improve classification accuracy. Under the
assumption that the data lie on a submanifold in a high dimensional
space, we develop an algorithmic framework to classify a partially
labeled data set in a principled manner. The central idea of our
approach is that classification functions are naturally defined only
on the submanifold in question rather than the total ambient space.
Using the Laplace Beltrami operator one produces a basis for a
Hilbert space of square integrable functions on the submanifold. To
recover such a basis, only unlabeled examples are required. Once
such a basis is obtained, training can be performed using the
labeled data set.
Our algorithm models the manifold using the adjacency graph for the
data and approximates the Laplace Beltrami operator by the graph
Laplacian. We provide details of the algorithm, its theoretical
justification, and several practical applications for image, speech,
and text classification.
Original Document
The original document is available in Postscript (uploaded 29 November, 2002 by
Partha Niyogi).