TR-2014-07
Automatic morphological alignment and clustering
Jackson L. Lee. 2 May, 2014.
Communicated by John Goldsmith.
Abstract
This paper describes an unsupervised algorithm, with no language-specific
knowledge, which takes a list of morphological paradigms and explores cross-
paradigmatic structure in terms of two computational tasks: alignment and
clustering. Based on complexity computation in a minimum description length
approach, the proposed algorithm learns the relationship across the paradigms
based purely on surface strings and formalizes the intuitive idea that, for in-
stance, "jumping" and "loving" belong to the same morphological category -- this
is alignment. Moreover, the algorithm simultaneously learns morphological
groupings of the paradigms akin to conjugation and declension classes -- this is
clustering. The clustering analysis also reveals more fine-grained hierarchical
structure among the inflectional classes. The algorithm is applied to verbal
paradigms from English and Spanish. The results are useful for further work
on the unsupervised learning and prediction-oriented research of paradigmatic
structure. We also show the value of computational techniques in linguistics
for both explicitly evaluating competing analyses and rigorously implementing
analyses.
Original Document
The original document is available in PDF (uploaded 2 May, 2014 by
John Goldsmith).