Point Process Models for Event-Based Speech Recognition

Aren Jansen; Partha Niyogi. 27 February, 2008.
Communicated by Partha Niyogi.


Several strands of research in the fields of linguistics, speech perception, and neuroethology suggest that durational modelling of a acoustic event landmark-based representation is a scientifically plausible approach to the automatic speech recognition (ASR) problem. Adopting a point process representation of the speech signal opens up ASR to a large class of statistical models that have seen wide application in the neuroscience community. In this paper, we formulate several point process models for application to speech recognition, designed to operate on sparse detector-based representations of the speech signal. We find that even with a noisy and extremely sparse phone-based point process representation, obstruent phones can be decoded at accuracy levels comparable to a basic hidden Markov model baseline and with improved robustness. We conclude by outlining various avenues for future development of our methodology.

Original Document

The original document is available in PDF (uploaded 27 February, 2008 by Partha Niyogi).