TR-2008-09
Point Process Models for Spotting Keywords in Continuous Speech
Aren Jansen; Partha Niyogi. 19 September, 2008.
Communicated by Partha Niyogi.
Abstract
We investigate the hypothesis that the linguistic content underlying
human speech may be coded in the pattern of timings of various
acoustic ``events `` (landmarks) in the speech signal. This hypothesis
is supported by several strands of research in the fields of
linguistics, speech perception, and neuroscience. In this paper, we
put these scientific motivations to the test by formulating a point
process-based computational framework for the task of spotting
keywords in continuous speech. We find that even with a noisy and
extremely sparse, phone landmark-based point process representation,
keywords can be spotted with accuracy levels comparable to recently
studied hidden Markov model-based keyword spotting systems. We show
that the performance of our keyword spotting system in the high
precision regime is better predicted by the median duration of the
keyword rather than simply the number of its constituent syllables or
phonemes. When we are confronted with very few (in the extreme case,
zero) examples of the keyword in question, we find that constructing a
keyword detector from its component syllable detectors provides a
viable approach.
Original Document
The original document is available in PDF (uploaded 19 September, 2008 by
Partha Niyogi).