From Signatures to Finite State Automata

John Goldsmith; Yu Hu. 16 May, 2005.
Communicated by John Goldsmith.


In this paper, we outline the design of a non-deterministic finite state automaton (NFSA) for natural language morphology, and compare it to previous work in unsupervised learning of morphology. In Section 2, we describe the nature of an MDL-based system for unsupervised learning of morphology, using the signature-based model of Goldsmith 2001 as an example, and we describe some drawbacks of the signature-based model. In Section 3, we present an alternative model which is a non-deterministic finite state automaton, distinguishing between convergent and divergent states, a difference that corresponds to inflectional versus derivational morphology and specify an MDL model based it. In Section 4, we review the ways in which a Patricia trie has been used by several authors as a bootstrap means for finding morphemes, and the final sections describe the ways in which we are focusing on obtaining layers of morphological structure.

