TR-2005-05

From Signatures to Finite State Automata

John Goldsmith; Yu Hu. 16 May, 2005.
Communicated by John Goldsmith.

Abstract

In this paper, we outline the design of a non-deterministic finite state automaton (NFSA) for natural language morphology, and compare it to previous work in unsupervised learning of morphology. In Section 2, we describe the nature of an MDL-based system for unsupervised learning of morphology, using the signature-based model of Goldsmith 2001 as an example, and we describe some drawbacks of the signature-based model. In Section 3, we present an alternative model which is a non-deterministic finite state automaton, distinguishing between convergent and divergent states, a difference that corresponds to inflectional versus derivational morphology and specify an MDL model based it. In Section 4, we review the ways in which a Patricia trie has been used by several authors as a bootstrap means for finding morphemes, and the final sections describe the ways in which we are focusing on obtaining layers of morphological structure.

Original Document

The original document is available in PDF (uploaded 16 May, 2005 by John Goldsmith).