Two statistical approaches to finding vowel harmony

Adam C. Baker. 28 July, 2009.
Communicated by John Goldsmith.


The present study examines two methods for learning and modeling vowel harmony from text corpora. The first uses Expectation Maximization with Hidden Markov Models to find the most probable HMM for a training corpus. The second uses pointwise Mutual Information between distant vowels in a Boltzmann distribution, along with the Minimal Description Length principle to find and model vowel harmony. Both methods correctly detect vowel harmony in Finnish and Turkish, and correctly recognize that English and Italian have no vowel harmony. HMMs easily model the transparent neutral vowels in Finnish vowel harmony, but have difficulty modeling secondary rounding harmony in Turkish. The Boltzmann model correctly captures secondary roundness harmony and the opacity of low vowels in roundness harmony in Turkish, but has more trouble capturing the transparency of neutral vowels in Finnish.

Original Document

The original document is available in PDF (uploaded 28 July, 2009 by John Goldsmith).