TR-2009-03
Two statistical approaches to finding vowel harmony
Adam C. Baker. 28 July, 2009.
Communicated by John Goldsmith.
Abstract
The present study examines two methods for learning and modeling vowel
harmony from text corpora. The first uses Expectation Maximization
with Hidden Markov Models to find the most probable HMM for a training
corpus. The second uses pointwise Mutual Information between distant
vowels in a Boltzmann distribution, along with the Minimal Description
Length principle to find and model vowel harmony. Both methods
correctly detect vowel harmony in Finnish and Turkish, and correctly
recognize that English and Italian have no vowel harmony. HMMs easily
model the transparent neutral vowels in Finnish vowel harmony, but
have difficulty modeling secondary rounding harmony in Turkish. The
Boltzmann model correctly captures secondary roundness harmony and the
opacity of low vowels in roundness harmony in Turkish, but has more
trouble capturing the transparency of neutral vowels in Finnish.
Original Document
The original document is available in PDF (uploaded 28 July, 2009 by
John Goldsmith).