Point Process Models for Spotting Keywords in Continuous Speech

Aren Jansen; Partha Niyogi. 19 September, 2008.
Communicated by Partha Niyogi.


We investigate the hypothesis that the linguistic content underlying human speech may be coded in the pattern of timings of various acoustic ``events `` (landmarks) in the speech signal. This hypothesis is supported by several strands of research in the fields of linguistics, speech perception, and neuroscience. In this paper, we put these scientific motivations to the test by formulating a point process-based computational framework for the task of spotting keywords in continuous speech. We find that even with a noisy and extremely sparse, phone landmark-based point process representation, keywords can be spotted with accuracy levels comparable to recently studied hidden Markov model-based keyword spotting systems. We show that the performance of our keyword spotting system in the high precision regime is better predicted by the median duration of the keyword rather than simply the number of its constituent syllables or phonemes. When we are confronted with very few (in the extreme case, zero) examples of the keyword in question, we find that constructing a keyword detector from its component syllable detectors provides a viable approach.

Original Document

The original document is available in PDF (uploaded 19 September, 2008 by Partha Niyogi).