Quantitative Lexical Comparison

University of Leipzig, Fall 2009


The comparison of lexical material between languages is one of the central pillars of historical-comparative reconstruction, and thus of our knowledge about the genealogical relations between languages. In practice, this kind of research consists to a large extent of manually searching through dictionaries and wordlists, paired with a great amount of knowledge about the languages in question. This laborious kind of research seems predestined to be assisted by modern computational power, though this is not (yet) what has happened. In this course, we will discuss previous art, what seems feasible in the short run, and what is still missing for quantitative approaches to really take off.

First, we will go through the somewhat hidden (because mostly not published in the linguistic mainstream) history of quantitative approaches to lexical comparison. Then, we will discuss approaches from computer science and bio-informatics that seem to be relevant to lexical comparison, although they have not been devised with an application in linguistics in mind. A central point of reflection will thus be to what extent these methods are relevant, and to what extend they have to be changed for application in linguistics. Finally, the question will be raised what linguists will have to do for lexical data to be better applicable for automatic analysis.

Although this course is about mathematical approaches to language comparison, the main goal will be to make linguists without a profound mathematical background familiar with the concepts and basic principles. I do not expect the participants to immerse themselves into the algorithmic details, though some basic technical abstraction will be necessary.