Scientists have created software that can reconstruct long-dead languages. The new tool can rebuild protolanguages – the ancient tongues from which our modern languages derived.
To test the system, the team took 637 languages currently spoken in Asia and the Pacific and recreated the early language from which they descended. The work is published in the Proceedings of the National Academy of Science.
Up to now, linguists have been carrying out language reconstructions – a slow and labour-intensive task. Dan Klein, an associate professor at the University of California, Berkeley, said: “It’s very time consuming for humans to look at all the data. There are thousands of languages in the world, with thousands of words each, not to mention all of those languages’ ancestors. (…) It would take hundreds of lifetimes to pore over all those languages, cross-referencing all the different changes that happened across such an expanse of space – and of time. But this is where computers shine.”
Over thousands of years, tiny variations in the way that we produce sounds have meant that early languages have morphed into many different descendents. Dr Klein explains: “These sound changes are almost always regular, with similar words changing in similar ways, so patterns are left that a human or a computer can find. The trick is to identify these patterns of change and then to ‘reverse’ them, basically evolving words backwards in time.”
The scientists demonstrated their system by looking at a group of Austronesian languages that are currently spoken in southeast Asia, parts of continental Asia and the Pacific. From a database of 142,000 words, the system was able to recreate the early language from which these modern tongues evolved. The scientists believe it would have been spoken about 7,000 years ago.
They then compared the computer’s findings to those of linguists, finding that 85% of the early words that the software presented were within one “character” – or sound – of the words that the language experts had identified.
While the computerised method was much faster, the scientists said it would not put the experts out of a job. The software can churn through large amounts of data quickly, but it does not bring the same degree of accuracy as a linguist’s expertise.
Dr Klein said: “Our system still has shortcomings. For example, it can’t handle morphological changes or re-duplications – how a word like ‘cat’ becomes ‘kitty-cat’. At a much deeper level, our system doesn’t explain why or how certain changes happened, only that they probably did happen.”
While researchers are able to reconstruct languages that date back thousands of years, there is still a question mark over whether it would ever be possible to go even further back to recreate the very first protolanguage from which all others evolved.