Monday, January 4, 2010

Glottochronology

A couple of papers on glottochronology recently came across my desk: Measures of lexical distance between languages and Lexical evolution rates by automated stability measure by Filippo Petroni and Maurizio Serva.

Rather than rely on human judgements of which terms are cognates, they compute normalized Levenshtein distances between pairs of words from the Swadesh list and then use another automated procedure to compute the correlations to compute phylogenetic trees. The resulting trees are comparable to those produced in the traditional way, and the papers include some analysis of the stability of the technique and such. It is interesting in that since it is a purely mechanical operation, one can grind through a lot of languages given nothing more than some basic vocabulary lists. The word lists and resulting trees are on-line for the interested. (Note: the time axis in the trees runs, annoyingly, from 0 to 5000 plus some unspecified amount of trailing years, so mapping the branch points to approximate dates requires some amount of mental arithmetic.)

There is something deeply appealing to me about applying mathematical approaches to linguistics in this way, but some note of caution is warranted as well. The drawing of phylogenetic trees (as the name should indicate) is borrowed from the use of matching techniques on DNA sequences to construct (true) phylogenetic tress. The application in genetics is more straight-forward, however, in that snakes don't invade dog-land and replace half their genes. In addition, while DNA sequences capture the essence of inherited traits in living creatures, there is a lot more to a language than 200 core words. Plus, while all living things appear to apply exactly the same rules in interpreting DNA sequences, languages have a lot of different takes on how they use phonemes to generate meaning: it isn't clear that Levenshtein difference is a great measure to use in comparing languages that use vowel variations for inflections (think: strong verbs) against those that use endings (think: weak verbs) when applied to some default form.  In sum: there are a lot more features to language than words, we can't be sure that the comparison rules are consistent, and language contact and borrowing plays a role that it doesn't (outside the microbial world) in genetics.

This is not to say this is a useless enterprise, not by any means, and certainly the practitioners in this field are well aware of these issues. Indeed, the mathematical approach provides some hope of including additional language features and untangling some of the mess. The details of the comparisons can give clues to some of the impacts of contact.

Speaking of contact, Santa left me the John McWhorter's "Our Magnificent Bastard Tongue", in which McWhorter applies his work on creoles to the history of English and rails against the focus on words, pointing out that the grammar of English (and Proto-Germanic) shows some very interesting things about the history of English as a language formed in language contact, and not just by borrowing a lot of words. In particular, he points to a couple of odd-ball features of English grammar (the meaningless do, and the use of progressive for present tense) as coming from Welsh. He argues that the supposed Celtic genocide never happened (and cites some recent genetic data to back him up). Later on, the Danelaw led, not just to the adoption of some words, but by the streamlining of the grammar. Fascinating stuff. The argument looks weakest in trying to explain why written English changes at such odds with the supposed timing of the changes on the ground.

Where things turn truly speculative is when he points to some slender evidence for Semitic (Phoenician?) influence on Proto-Germanic: the loss of inflectional endings, the consonant shift, and the introduction of a lot of words that lack cognates in the rest of Indo-European. Tantalizing, for sure, but surely speculative.

Bringing us back to the top, having a fuller set of features to compare languages should be able to show us these things. Does Proto-Germanic show affinity for Phoenician? How much?

No comments: