TR-2005-04
A heuristic for morpheme discovery based on string edit distance
John Goldsmith; Yu Hu; Irina Matveeva; Colin Sprague. 13 May, 2005.
Communicated by John Goldsmith.
Abstract
We propose a new heuristic for the auomatic discovery of morphemes and of morphological structure from an arbitrary corpus. At present the best known heuristic of this sort is based on Zellig Harris's proposal (1955), which employs the notion of successor frequency. We define a different heuristic based on the string edit distance algorithm, and test its consequences for the automatic discovery of morpheme boundaries in Swahili, an important Bantu language of East Africa. We show that the results of the new heuristic are superior to those obtained with earlier methods.
Original Document
The original document is available in PDF (uploaded 13 May, 2005 by
John Goldsmith).