Does the history of our languages match the history of our genes? Charles Darwin thought yes, others said no. An interdisciplinary team of researchers from the Max Planck Institute for Evolutionary Anthropology, the University of Zurich and Harvard University has put together GeLaTo, a global database linking linguistic and genetic data. They found a large number of matches but also widespread and systematic mismatches.
More than 7000 languages are spoken in the world. This linguistic diversity is transmitted from one generation to the next, as we learn the language of our parents. The transmission of languages can be compared to the transmission of biological traits and genes, suggesting similar evolutionary paths. However, languages can be learned not only from the parents, but also from other groups and peers. Are languages and genes tied together in human history? A new study tested this for the first time on a global scale.
A joint venture initiated by the Max Planck Institute for Evolutionary Anthropology in Leipzig in collaboration with the University of Zurich and Harvard University has created a database with the tasty name GeLaTo (GEnes and LAnguages TOgether) – the Italian term for ice cream – that combines genomic and linguistic data to study the global evolution of language. In this new study, the researchers examined and quantified the extent to which the linguistic and genetic histories of populations coincided.
“We focused for the first time on gene-language mismatches, cases where the biological and linguistic patterns differed: how often and where do they occur, which types can we identify”, said Chiara Barbieri, a geneticist from the University of Zurich who led the study. “It is clear that people who speak related languages tend to be genetically related, confirming a shared biological and cultural history. But this is not always the case: about every fifth gene-language relation in our database is a mismatch, and they occur worldwide.”
Mismatches between genetics and linguistics
Most mismatches result from populations shifting to the language of a neighbouring population that is genetically different. The case of people who maintain an original linguistic identity despite genetic assimilation with their neighbours is rarer, but does occur as well. For example, the Hungarian people are genetically similar to their neighbours, but their language is related to languages of Siberia.
“Once we know where such language shifts happened, we can try to answer why they happened”, explains Russell Gray, Director of the Department of Linguistic and Cultural Evolution at the Max Planck Institute for Evolutionary Anthropology and initiator of the project. “This combined approach allows us to dig into our past, and understand the role of language in shaping human diversity — a diversity that is magnitudes larger than in other primates.”
Systematic studies with large data
GeLaTo contains genetic information from more than 4000 individuals speaking almost 300 languages. This global database contains enough information to disentangle demographic and linguistic histories. The resource is linked to other linguistic and cultural databases developed in Gray’s department. Robert Forkel, scientific programmer at the Max Planck Institute for Evolutionary Anthropology, and one of the authors of the study, adds: “It was encouraging to see that our research data framework could serve as the backbone to link to genetic data as well”.
The scope of GeLaTo is global, exploring gene-language relationships across continents. “The most studied case of gene-language relationship has been Indo-European, the language family most diffused in Europe and parts of Asia, which comprises languages like French, German, Spanish, Farsi and Greek. We find that the level of matches for this family is particularly high, in comparison to other regions and language families in our dataset. This might have given the impression that gene-language matches are the norm”, states Damián Blasi, a researcher at the Department of Human Evolutionary Biology at Harvard University who co-led the study. “The availability of genetic data is biased towards Western countries. It is important to include genetic and linguistic data from populations all over the world to understand language evolution”.