Automatic morphological alignment and clustering

Jackson L. Lee. 2 May, 2014.
Communicated by John Goldsmith.


This paper describes an unsupervised algorithm, with no language-specific knowledge, which takes a list of morphological paradigms and explores cross- paradigmatic structure in terms of two computational tasks: alignment and clustering. Based on complexity computation in a minimum description length approach, the proposed algorithm learns the relationship across the paradigms based purely on surface strings and formalizes the intuitive idea that, for in- stance, "jumping" and "loving" belong to the same morphological category -- this is alignment. Moreover, the algorithm simultaneously learns morphological groupings of the paradigms akin to conjugation and declension classes -- this is clustering. The clustering analysis also reveals more fine-grained hierarchical structure among the inflectional classes. The algorithm is applied to verbal paradigms from English and Spanish. The results are useful for further work on the unsupervised learning and prediction-oriented research of paradigmatic structure. We also show the value of computational techniques in linguistics for both explicitly evaluating competing analyses and rigorously implementing analyses.

Original Document

The original document is available in PDF (uploaded 2 May, 2014 by John Goldsmith).