TR-2008-04
Point Process Models for Event-Based Speech Recognition
Aren Jansen; Partha Niyogi. 27 February, 2008.
Communicated by Partha Niyogi.
Abstract
Several strands of research in the fields of linguistics, speech
perception, and neuroethology suggest that durational modelling of a
acoustic event landmark-based representation is a scientifically
plausible approach to the automatic speech recognition (ASR) problem.
Adopting a point process representation of the speech signal opens up
ASR to a large class of statistical models that have seen wide
application in the neuroscience community. In this paper, we formulate
several point process models for application to speech recognition,
designed to operate on sparse detector-based representations of the
speech signal. We find that even with a noisy and extremely sparse
phone-based point process representation, obstruent phones can be
decoded at accuracy levels comparable to a basic hidden Markov model
baseline and with improved robustness. We conclude by outlining
various
avenues for future development of our methodology.
Original Document
The original document is available in PDF (uploaded 27 February, 2008 by
Partha Niyogi).