Some Thoughts on “Facts and Fallacies in Historical Linguistics”
Written by
Categories: Ancient, Medieval, Modern

Some Thoughts on “Facts and Fallacies in Historical Linguistics”

The cover of a book showing a map of Eurasia in the background and a tree of Indo-European languages in the foreground

Asya Pereltsvaig, Martin W. Lewis, The Indo-European Controversy: Facts and Fallacies in Historical Linguistics (Cambridge University Press: Cambridge, 2015) ISBN 978-1107054530 Bookfinder link

A few years ago, some very bad linguistics was published in some very famous journals and credulously reported by newspapers which are very widely read. Usually, academics respond to nonsense by ignoring it, because proving something wrong is much more work than claiming it in the first place (Brandolini’s Law), and because the authors of bad research rarely respond well to criticism and fans of that research are not always interested in a second opinion. But two blogging philologists, Martin Lewis and Asya Pereltsvaig, have written an entire book exploring the problems with these papers and standing up for the importance of geography and historical linguistics in any attempt to understand past languages and cultures.

Pereltsvaig and Lewis did an impressive job of organizing their criticisms of a work which operates from very different premises than mainstream linguistics. “Our key objection to the Gray-Atkinson approach to Indo-European origins and expansion is not that it merely fails, but rather that it fails repeatedly, returning incorrect results at almost every turn.” (p. 114) Gray and Atkinson (and their paper “Mapping the Origins and Expansion of the Indo-European Language Family”) represent languages as collections of words for basic meanings and the breakup of a language into a family of languages as word in these collections ceasing to be cognates of one another at a variable random rate. Changes in sound and grammar, like those which turned Latin pons ferrī into põ du fer in Paris but póntey di férro in Florence, are not represented. They then generate a vast number of possible language trees, including living, historical, and reconstructed languages, and try to find the most probable one through a vast number of iterations. This is combined with a spatial model which represents the spread of a proto-language and its split into different languages until their arrival at their current locations in terms of a random walk. This model is key to their claim that they have “decisive support for an Anatolian origin over a steppe origin.” As Lewis and Perltsvaig explain, while this approach uses powerful computers and Bayesian statistics to sound modern and scientific, it is in fact based on ideas from the 19th century to the 1980s which no longer convince many linguists.

A Google Earth map of Eurasia overlaid with a tree of Indo-European languages and various coloured blobs representing areas certain to have already been occupied by Indo-European speakers.
The minimum area occupied by Indo-European speakers in 64 BCE, according to Bouckaert et al. 2012 “Mapping the Origins” Movie S1. According to their model, Indo-European languages have not certainly reached western Gaul, Iberia, Sicily, or Crete but have certainly filled Syria and northern Iraq for thousands of years. Somehow, the ancient texts I have read give a very different picture …

The assumption that languages spread with the growth of the population that speaks them (demic diffusion) comes from Colin Renfrew, the founder and main supporter of the Anatolian hypothesis and a believer that the Indo-European languages spread into Europe early with the first farmers. In Renfrew’s view, Indo-European speakers advanced a few miles at a time, outbreeding the peoples who had lived in the area before them. As the supplementary materials of “Mapping the Origins” acknowledge, “The rapid expansion of a single language and nodes associated with branches not represented in our sample will not be reflected in this figure.” However, the assumption that languages spread in this way is precisely one of the things in which supporters of the Anatolian hypothesis differ from most linguists! If the spread of the Indo-European languages had something to do with the technology of horsebreeding and wheeled vehicles, as most scholars believe, then it could easily have spread faster. Pereltsvaig and Lewis provide many examples of peoples who migrated long distances in prehistory and the ancient world, or who spread their language and customs by convincing speakers of other languages to adopt them (acculturation), and evidence that even the introduction of agriculture into Europe occurred too fast to simply represent each new village being placed a mile or two deeper into the woods than the last. When “Mapping the Origins” assumes that Indo-European spread by demic diffusion and not migration or acculturation, it assumes what it is trying to prove.

Estimating the date of the most recent common ancestor of two languages by the proportion of cognates with a “basic meaning” is a method known as glottochronology. This was invented by Morris Swadesh in the 1950s, but linguists found evidence against his assumptions and practical challenges in applying his method. In particular, choosing which words to use for those “basic meanings” can have a large impact on results. Given a picture of canis domesticus, most German speakers will say Hund, and most English speakers “dog,” but a few will say “hound” or “hound dog” or “puppy.” Does English share a cognate for this basic meaning with German, or not? Now imagine that you are trying to decide on the basis of a battered King James Version and a handful of gravestones, without being able to ask a native speaker “is a dog a kind of hound?” Several lists of “basic meanings” are commonly used, and Pereltsvaig and Lewis present evidence that running the same program on a different list or different choice of words would produce a significantly different result. Some linguists are more open to glottochronology than Pereltsvaig and Lewis are (at least as a rough guideline) but they are more likely to describe their results as “tentative” than “decisive.”

A diagram of electoral districts parodied as a mishapen monster, and a map of 'Kurdish-speaking regions' compared to a lounging rodent or galloping dino
Gerrymanders are not just for 19th century elections in the land of the red white and blue! Image by Martin W. Lewis 2012

The geographical model in “Mapping the Origins” which they used assumed that the territory of a language was contiguous (Supplementary Materials, 2 Location data). As a result, Classical Greek is modelled as confined to Attica (rather than spread around the islands and shores of the Mediterranean), Kurdish is mapped as a kind of gerrymander which links the different pockets of speakers, and most of Switzerland and Northern Italy are not assigned any language at all. Pereltsvaig and Lewis say that the model often differs from its supposed source Ethnologue, and that the boundaries between languages (and the limits of Indo-European expansion) often correspond suspiciously well to national borders as of circa 1991. Despite all the deportations, massacres, and subjugations of the last hundred years, the language situation in most countries is more complicated than that!

The geographical model in “Mapping the Origins” assumes that movement is at a low but random rate and the people chose a direction at random, although optionally they can be made to prefer moving onto land over water. As a result, the speakers of Proto-Tocharian are modelled advancing along the crest of the Tien Shan Mountains at an elevation of more than 6000 metres, while the future Icelanders are represented as spending centuries living in the middle of the Atlantic before they finally reach their destination. Since “Mapping the Origins” extrapolates backwards from the last known location of a language, errors in the first step towards the homeland can obviously multiply through subsequent steps.

The geographical model in “Mapping the Origins” has difficulty representing the spread of Indo-European languages into Sweden, Iberia, Ukraine, and Russia (all of which are modelled as not certainly Indo-European-speaking until a few hundred years ago). As the authors of “Mapping the Origins” confess (supplementary materials, figure S4):

The rapid expansion of a single language and nodes associated with branches not represented in our sample will not be reflected in this figure. For example, the lack of Continental Celtic variants in our sample means we miss the Celtic incursion into Iberia and instead infer a later arrival into the Iberian peninsular associated with the break-up of the Romance languages (and not the initial rapid expansion of Latin).

However, this naturally begs the question whether the lack of steppe languages in their sample influenced their model! While most researchers (and especially advocates of the Revised Steppes Hypothesis) think that Indo-European languages were widely spoken in this area from prehistory into classical times, none of these languages has left a large body of texts which allows scholars to compile a vocabulary of everyday words. On the other hand, Anatolian Hittite is very well documented at an early date, because its speakers wrote it on durable clay.

A strange tree of ancient and living languages, such that named languages are always 'leaves' and the 'forks' are always unnamed
A tree of Indic language from figure S2 of the supplementary materials of Bouckaert et al. “Mapping the Origins” (as revised in December 2013). Their word-based method predicts that the ancestor of Romani separated from the ancestor of modern Indian languages about 2500 years ago, but Romani grammar shows many small changes which occurred in India about a thousand years ago. Most linguists think that the ancestors of the Roma left India during the middle ages, and lost much of their Indian vocabulary while living among speakers of Armenian, Greek, and Persian.

Even if watching academics savage each other’s ideas is not your favourite way to spend an afternoon, this book is full of gems. From Max Müller’s sad warning in 1887 that “an ethnologist who speaks of Aryan race, Aryan blood, Aryan eyes and hair, is as great a sinner as a linguist who speaks of a dolichocephalic dictionary or a brachycephalic grammar” (p. 24: there was a fad for measuring skulls in his day), to an atlas published in 1946 which earnestly declared that South Sudan, Uganda, and parts of the DRC were mainly populated with Caucasians who happened to be a bit dark of complexion (p. 29), to the idea that the ancestors of the Roma were a caste of camp followers who crossed the Hindu Kush and settled for a time under the Seljuks of Rum (pp. 166, 167), there are plenty of details to entertain a reader and keep their local interlibrary loan department gainfully occupied. Something about the search for the origin of the Indo-European languages has always attracted people with more energy and learning than sense, and there is plenty to smile at as long as we remember that the theories fashionable in our own day may come to sound just as quaint. Pereltsvaig and Lewis are wise enough to warn that while they think the evidence strongly favours the Revised Steppes Hypothesis, it is probably not the last word.

The Indo-European Controversy is written in a clear academic style. While some sections feel more philological and others more geographical, the tone and style are harder to tell apart. At first glance, this book appears lightly cited with references to one or two key works on each topic. However, it contains a bibliography of 28 pages and 450 entries. Six figures illustrate different trees of the Indo-European languages, and 38 maps explore the geographical aspects of the model. These are very well designed (as one would expect in a book by a geographer), despite the limits of greyscale, but they are buried at the back on un-numbered pages and in ‘landscape’ orientation. Including some of these in the body of the book and rotating them into the same orientation as the text might have made readers more likely to use them (although the authors were no doubt eager to get their book into print quickly enough to be an effective response, and some of the maps may have been designed to be displayed on a projector screen then adapted for printing). There is a glossary to help readers with linguistics terminology. The binding feels solid and the paper takes notes well.

This book is an excellent example of a scholarly response to bad science. It explains the major problems with “Mapping the Origins” in a clear and comprehensive way, without losing its temper or waving its hands (in the body of the book, Lewis and Pereltsvaig never ask readers to accept a vague appeal to “centuries of research” or “dozens of sources” without providing at least one example and a place to learn more). The format is appropriate for one major audience of the ideas it rebuts (students and academics in other fields who heard of “Mapping the Origins” and want a more formal response than the authors’ lectures and blog posts). At the same time, The Indo-European Controversy provides an introduction to Indo-European philology, one of the greatest products of the modern university. The philologists may have been victims of their own success, as they inspired the public to run wild with dreams of heroic charioteers or savage Kurgans then wandered off into increasingly technical arguments about details (and a growing conviction that those dreams were wrong in important ways). Outside of linguistics departments, the methods of historical linguists have been so thoroughly forgotten that the editors of Science failed to see the problems with this research (or to check the geographical and linguistic data as carefully as the mathematics). Proto-Indo-European is a linguistic hypothesis, and it should be studied with the tools of linguistics, not with little knowledge and plenty of computing power.

Further Reading:

  • David W. Anthony, The Horse, The Wheel, and Language (2007) {the statement of the latest version of the Revised Steppes Hypothesis} Bookfinder link
  • Lyle Campbell, Historical Linguistics: An Introduction (Edinburgh University Press and MIT Press, two editions available) {English-language introduction to historical linguistics}
  • David Henige, “Truth or Hope? Stimulus and Response in Academic Publishing,” Journal of Scholarly Publishing (January 2011) doi: 10.3138/jsp.42.2.205 {on scientists’ response to psuedoscience, as defined by Henige}
  • Bouckaert et al. “Mapping the Origins and Expansion of the Indo-European Language Family,” Science 2012 DOI: 10.1126/science.1219669

Update 2017-01-28: Added a snapshot from Movie S1 to break up the wall-of-text; s/constant/low but random;

Update 2017-01-29: Since this post is popular, added an example of the language trees in Bouckaert et al. 2012 and why linguists are not impressed by them

Edit 2023-03-09: block editor

paypal logo
patreon logo

4 thoughts on “Some Thoughts on “Facts and Fallacies in Historical Linguistics”

  1. Big Data in World History: Seshat vs. DRH | Book and Sword says:

    […] Beheim, Quentin Atkinson (yes, that Atkinson), et al., “Corrected analyses show that moralizing gods precede complex societies but serious […]

  2. Writing for the Curious – Book and Sword says:

    […] Asya Pereltsvaig and Martin W. Lewis, The Indo-European Controversy: Facts and Fallacies in Historical Linguistics (Cambridge University Press: Cambridge, 2015) [review] […]

  3. Mathematical Methods and Research as a Community – Book and Sword says:

    […] of research such as Gray and Atkinson on Proto-Indo-European are often frustrating because many critics did not have the language to express the mathematical […]

  4. Linguistics B(h)at Signal: Phylogenetics and PIE Again – Book and Sword says:

    […] paper in Science on this topic using these methods was so poor from a linguistic point of view that a whole monograph from Cambridge University Press was needed to explain the problems. Like the last paper, this one is in Science, which is a good […]

Write a comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.