Big Data in World History: Seshat vs. DRH
Ancient historians have been in the big open data business for almost 200 years, with Mommsen’s establishment of the Corpus Inscriptionum Latinarum to publish all surviving ancient Latin inscriptions in 1853. Right now there are two competing projects to create an encyclopedia of quantitative data on world religious history which could be subjected to statistical tests: the Database of Religious History at UBC, and Peter Turchin’s Seshat project in the USA. Turchin belongs to a Russian tradition of social scientists such as Andrey Vitalievich Korotayev who want to find predictive, mathematical laws of history, often in the forms of cycles. A recent paper based on Seshat data has provoked not one but two responses only six weeks after publication.
- Harvey Whitehouse et al., “Complex Societies Precede Moralizing Gods Throughout World History,” Nature 568 (20 March 2019) pp. 226-229 https://doi.org/10.1038/s41586-019-1043-4
- Edward Slingerland et al., “Historians Respond to Whitehouse et al. (2019), ‘Complex Societies Precede Moralizing Gods Throughout World History'”, PsyArXiv Preprints, 2 May 2019 https://doi.org/10.31234/osf.io/2amjz
- Bret Beheim, Quentin Atkinson (yes, that Atkinson), et al., “Corrected analyses show that moralizing gods precede complex societies but serious data concerns remain,” PsyArXiv Preprints, 2 May 2019 https://doi.org/10.31234/osf.io/jwa2n
Whitehouse and his Seshat colleagues ask whether “belief in morally concerned supernatural agents culturally evolved to facilitate cooperation among strangers in large-scale societies.” They examine which appears first in 414 societies over the past 10,000 years, and argue that the sequence tends to be first large-scale societies and then moralizing gods.
The respondents, some of then associated with the DRH and others independent, have many concerns. The first is a point about the relationship between past societies, and the written traces they left behind:
Unlike proxies of social complexity such as polity size and population density, their definition of MG requires written evidence in order for MGs to be detected. Yet, as one proceeds back through the archeo-historical record, both literacy and written materials become less common. Thus, the earliest surviving documentary evidence of MGs will likely be much later than their actual emergence and differentially ‘forward biased’ relative to the physical evidence of social complexity. For example, Hawaii’s population history is well-documented archaeologically, but MGs only appear in the Hawaiian Seshat records upon the arrival of Europeans with quills. In light of Pacific ethnography, MGs likely existed in Hawaii far earlier than post-contact accounts.
When they repeat the analysis on the assumption that people in a region may believe in moralizing gods before the first surviving text documents this, the results reverse and the usual sequence is first moralizing gods and then large-scale societies. This strikes me as plausible, because moralizing gods seem to me the kind of thing that people imagine throughout world history, often alongside gods which are above human concerns.
Moreover, they argue that the Seshat data is fundamentally flawed: created by busy research assistants without training in the specific fields, copied and pasted across periods more than a thousand years long or from an imperial power to a newly claimed territory, granted authority by citing vetters who deny that they were involved, and failing to follow standard version-control practices as they change their main data set (the data submitted with the Nature article has been taken down and replaced with an updated version, so readers can’t look at the original data and decide whether the criticisms are fair). They don’t object to a database of religious history in principle, but they think that the Seshat data needs say ten times more work before its a trustworthy representation of what experts know and can’t know. In short, they accuse the paper in Nature of being not just bad history, but bad statistics and bad quantitative social science.
I can say that the Seshat entries for Achaemenid Susiana and Achaemenid Sogdiana are more of a ‘notebook’ than a polished article: all the fields are filled in, but the only vetting was for the ‘moralizing high god’ religious data, and the supporting essay is patched together quotations from modern authorities, some of them describing places as distant as Egypt. They also want to code concepts as binaries (either there is promotion by merit in the bureaucracy, or there is not) which seems like a false dilemma. On the other hand, I like their choice of 30 regions to use as the framework for their world history.
People studying the ancient world are often at the cutting edge of methods, whether their leap into electronic and digital publication in the 1990s or their sophisticated work on house sizes and inequality. Both of these projects agree that an open encyclopedia of data on world history is a good thing, and that the kinds of statistical methods which quantitative social scientists like are worth trying. But at the same time, its wise not to assume that adapting those methods to another kind of evidence will be quick and easy, or to overlook the need to convince people who already study the subject that these new methods are valid.
Several of the Seshat/Nature side of the debate are responding in blog posts. They have had so little time to read and absorb their critics’ arguments that I would recommend focusing on reading their original article and the two responses. Note that Nature is a natural science journal, so what they call the article is more of an ‘executive summary’ and the actual contents and evidence are in the supplements.
Edit 2019-08-15: After a comment by one Roepke I would list some of the influences on Turchin’s theories as:
- (edit 2019-01-09 after reading more of Turchin’s article and a chat with T. Greer) fourteenth-century Arab polymath Ibn Khaldun’s theory that settled life caused conquerers to lose the solidarity which had enabled them to conquer their new homes
- Reverend Thomas Malthus’ observation that population growth often out-paces food production leading to famine and poverty
- the demographers who discovered synchronized cycles in population and standard of living: as population falls, standard of living rises, causing the birthrate to rise and the death rate to fall
- the Russians like Kondratiev (and the Russian-American thinker Pitrim A. Sorokin– ed. 2019-12-23, thanks to the late L. Sprague de Camp in Astounding Science Fiction July 1940) who see lots of other cycles in historical economic and demographic data
- the Marxist and social science focus on institutions and social structures
- Queletet’s social physics
- his own idea of elite overproduction to explain why internal political crises are often out of sync with rising and falling populations, and why some of the conflicts thrown up by structures and institutions shatter societies while others are managed
He is trying to discover something on the scale of plate tectonics or evolution by natural selection, but for a harder problem (even geologists won’t claim that they can predict when the next earthquake will happen, but he thinks he can predict the development of crises in modern societies within a year or two).
Edit 2019-10-05: Archeothoughts says that the SESHAT team has published a response to one of their critics. I have not read it, but he is not impressed.
Edit 2019-11-23: And Turchin posts a second draft response, and Bret Beheim responds:
“Thanks for the sneak peek, and commitment to transparency. It seems to be the same problem all as the last paper, though – you’re assuming that absence of evidence of religious beliefs is evidence of absence. Take Hawaii’ – in the original Whitehouse, et al. paper (and on the Seshat website), Seshat doesn’t seem to know anything about Hawaiian religion around 1200 CE (see here: http://seshatdatabank.info/data/polities/big-island-of-hawaii-Hawaii2). Everything is ‘unknown.’ Yet in the new dataset associated with this paper … [all of the data for moralizing gods are ‘inferred absent’]. Where is all this new data coming from? It seems like you’re just assuming all those unknowns are absences all over again. … Seshat has somehow gone from 100% missing values for pre-1580 Hawaii (what you call Hawaii1 and Hawaii2) in Whitehouse, et al. to now a state where there are no missing values at all. The articles don’t say where the data on ancient Hawai’i is coming from, so I’m asking you.”
Edit 2020-11-27: Since I have written several posts with my comments on attempts to quantify world history, I now have a tag to connect them: https://bookandsword.com/tag/historical-datasets/ See also Michael E. Smith’s “When Big Data are Bad Data” (2018) https://publishingarchaeology.blogspot.com/2018/12/when-big-data-are-bad-data.html and Alexy Guzey’s “Can we trust Peter Turchin?” https://guzey.com/people/peter-turchin/ (I have not confirmed Guzey’s criticisms)
Edit 2021-04-09: I am told that Trevor Dean’s Crime in Medieval Europe (p. 113) criticizes the attempt to estimate the murder rate in 1340s Oxford in C. I. Hammer, Jr., “Patterns of Homicide in a Medieval University Town: Fourteenth-Century Oxford”, Past and Present, no. 78 (Feb. 1978), pp. 3-23 He argues that the rate could be more like 66 per 100,000 persons per year than 110 per 100,000 persons per year (in most Euro and settler countries in the early 21st century, the rate is about 1 murder per 100,000 persons per year) There is also an argument that the invention of first aid, motorized ambulances, blood transfusions, and public health in the 20th century turns what would have been murders into assaults, although its hard to believe that that explains a difference of 50:1 or 100:1.
Edit 2022-08-22: Andre Costopoulos informs me that in a new paper the authors have doubled down on their claim to know the absence of moralizing gods in many societies which left few or no writings https://archeothoughts.wordpress.com/2022/08/03/moralizing-gods-redux/ As has been known since Aristotle’s day, proving the negative is hard.
Edit 2023-06-17: block editor
[…] way of thinking and the mechanism of formal evidence-based debate are powerful. Just like the debate between the Database of Religious History and SESHAT about moralizing gods, this disagreement is not a cause for despair but evidence that the scientific process is working […]
[…] idea that nobody can predict population or prices decades in the future should not be a surprise. Popular criticisms of a science of future history go back to de Camp’s “Science of Whithering” in 1940. Although people of good […]
[…] possess, but often they use proprietary software packages, large amounts of computing power, or the kind of big data which is bad data. And if you want to spiral closer and closer to the truth, you want to use methods which as many […]
[…] which trusts relatively few things, while “Canada science” would be something like a grand theory of history or a ‘big ideas’ book on psychology by a Kahneman or Ariely which draws on dozens of […]