A Geometric Perspective on Speech Sounds

Aren Jansen; Partha Niyogi. 27 June, 2005.
Communicated by Partha Niyogi.


In order to effectively approach high dimensional pattern recognition problems, one seeks to understand and exploit any inherent low dimensional structure. Recently, a number of {\it manifold learning} algorithms have been motivated by a geometric point of view that models high dimensional data as lying near a low dimensional submanifold of the original space. Our paper has two main goals:

(i) to investigate this manifold assumption for natural speech data. It seems intuitive that a human speech producing apparatus with few degrees of freedom would not produce sounds that fill up the acoustic space. We formalize this intuition by considering a concatenated acoustic tube model of the vocal tract and showing that the sounds generated by such a system lie on a low dimensional curved submanifold of the ambient acoustic space. To the extent that this model captures the essence of human speech production, the manifold assumption is true of natural speech data.

(ii) to explore the implications of this geometric point of view towards human speech. We show that the manifold structure of speech sounds may be exploited for dimensionality reduction, semi-supervised learning, and speech representation with sometimes striking perfomance improvements in simulated and real speech data. The non-linear geometry of speech sounds suggests new interpretations of phenomena such as the perceptual magnet effect or quantal theory.

Original Document

The original document is available in Postscript (uploaded 27 June, 2005 by Partha Niyogi).

Additional Document Formats

The document is also available in PDF (uploaded 27 June, 2005 by Partha Niyogi).

NOTE: The author warrants that these additional documents are identical with the originial to the extent permitted by the translation between the various formats. However, the webmaster has made no effort to verify this claim. If the authenticity of the document is an issue, please always refer to the "Original document." If you find significant alterations, please report to