top of page

The Babylonian Engine – Why this Akkadian AI translator is a groundbreaking tool for scholars

The use of artificial intelligence is seeing a meteoric rise, with increasingly diverse applications. In this age of Google Translate and ChatGPT, we're accustomed to transforming complex information at the push of a button. So why is this Akkadian translator so revolutionary for our understanding of the ancient world?


Translation is a tricky skill. Not only must a translator be a technical master of two or more languages, but they must also understand the way people who use those languages think and frame their sentences. In 1929, influential American linguist Edward Sapir wrote,"the worlds in which different societies live are distinct worlds, not merely the same worlds with different labels attached." Along with American linguist Benjamin Whorf, Sapir proposed the linguistic relativity hypothesis which suggests the language you speak frames your perception and thoughts.


Edward Sapir circa 1910 (Wikimedia Commons)

A common example of this hypothesis in action is the Inuit group of languages and their descriptions of snow. While English has one word for snow, Inuit languages have multiple terms which differentiate kinds of snow, equivalent to "clinging snow", "wet snow" and so forth. Another example is colour perception. Languages differ in their number of terms for specific colours, and the number of basic colour terms is fewer than the number of colours the human eye can perceive. The language we speak doesn’t limit our worldview, but it does sharpen it in specific ways and influence our attention towards different aspects.


When a translator goes to work they must understand all the nuances of a local speaker, in both languages at hand – and the task of translating to or from ancient languages adds yet another layer of difficulty. Attempting to understand the cultural milieu of a civilisation long gone is not an enviable task, and resurrecting an extinct language from purely written sources and fragmented remains takes a lot of time and effort.


Margaret & Arthur Evans, 1888 (Wikimedia Commons)

British archaeologist Sir Arthur Evans (1851–1941) spent his entire life trying to decipher a script he unearthed at Knossos on Crete. This came to be known as Linear A, which was used by the ancient Minoans in governmental and religious writing. This was succeeded by another script, Linear B, which was an early form of Greek used by the Mycenaeans. It has a similar form to Egyptian hieroglyphics but developed independently. Even now, despite Sir Evans' life's work and the work of those who have followed, parts of Linear A remain undeciphered.

Rosetta Stone (© The Trustees of the British Museum)

The most helpful artefact to decipher a language is one that translates a text into different languages for you. For many people, the most familiar example of this is the Rosetta Stone. No one could understand Egyptian hieroglyphs until this artefact was unearthed in 1799, featuring the same inscription in three different languages: Ancient Egyptian hieroglyphs, demotic Egyptian and ancient Greek. Ancient Greek was known to scholars of the time, who set about attempting to relate it to the then-unknown hieroglyphs. It took them 23 years to decode the secrets of hieroglyphs from the Stone's clues.


Akkadian is the language of the Akkadian Empire, which existed in Mesopotamia circa 2350–2150 BCE after it was founded by Sargon the Great. At its peak, this empire stretched from Anatolia in the north to Arabia in the south, and from Iran in the east to the Mediterranean in the west. Akkadian is the oldest known Semitic language, and an extinct spoken language today. It split into Assyrian and Babylonian dialects before being replaced by Aramaic early in the first millennium BCE.


Bronze head of an Akkadian ruler (Rijksmuseum van Oudheden via Wikimedia Commons)

We are fortunate to have many surviving examples of Akkadian script, for two primary reasons. Firstly, many ancient scholars continued to write in Akkadian cuneiform even after Aramaic became common. Secondly, the script's wedge-shaped glyphs were written by pressing a reed stylus onto a wet clay tablet, which was then baked. As clay and stone artefacts can weather more damage than paper or papyrus, a greater number survive – in fact, some of Mesopotamia’s greatest libraries were preserved due to destructive fires that strengthened the clay tablets for humanity to find thousands of years later. The first understandable Akkadian texts are from Ur, circa 28th Century BCE.


The cuneiform writing system has around 1000 signs but not all were used together, as they changed geographically, diachronically, and between periods and genres. On top of this, cuneiform is a difficult language to parse as its signs are polyvalent, meaning they have multiple possible readings depending on the context. Experts cannot directly translate Akkadian to a modern language – it must first be transliterated, meaning they need to gauge the context of each sign in a given sequence.

Ancient cuneiform tablets at the Museum of Anatolian Civilisation, Ankara, Turkey

Cuneiform signs can function in three ways: as logograms, symbols intended to represent a whole word; as determinatives, a word before another that signals that the associated word belongs to a particular semantic group; and as syllabograms, signs that outline the syllables of words.


Once an expert has transliterated the text, it creates a transcription of the cuneiform signs in the Latin alphabet, which can then be translated. This process takes years of training, practice and dedication, and the global pool of scholars who are proficient in translating Akkadian is too small to get through the hundreds of thousands of texts that have been found. We have found ourselves with a wealth of knowledge about the history of ancient Mesopotamia that we are unable to access.


Enter artificial intelligence. To combat this, researchers have created The Babylonian Engine, a neural machine translation model that can assist scholars in translating Akkadian to English. The AI model was trained on text samples from the Open Richly Annotated Cuneiform Corpus (ORACC) and was taught two ways to translate Akkadian: from transliterations of original texts, and from cuneiform symbols directly.


A selection of Akkadian cuneiform translated with AI (Image from Gai Gutherz, a computer scientist who was part of the team that developed the program, via The Times of Israel)

The model was able to handle the nuances of a sample’s genre, as well as understand the variations in cuneiform script across millennia. It has also been tested with BLEU4, a bilingual evaluation understudy often used to assess the quality of machine-translated text – where it scored above the target baseline, in the range of a high-quality translation for both transliteration to English, and in cuneiform to English. The results of this program have already been published in peer-reviewed PNAS Nexus, and the research and source code has been released on GitHub at Akkademia.


The program is not without limitations, as it has been created to work in tandem with scholars and students, and is not self-sufficient. Like all AI models, it is prone to intrinsic and extrinsic hallucinations* which result in mistranslated sentences. (*In AI, hallucination is a term for responses with no connection to the source.) The program works best with formulaic genres like royal decrees and administration records and has a limit of short-to-medium-length sentences. It functions as a ‘human-machine collaboration’ to assist academics in translation, and requires a human to check its results.


But, while it may not be perfect, The Babylonian Engine is a groundbreaking advancement for researchers striving towards the ‘preservation and dissemination of the cultural heritage of ancient Mesopotamia.’ As the model improves with time – as the number of digitised texts continues to grow – it will allow scholars to make much quicker work of deciphering the texts left behind, and the exciting discoveries that may be hiding within.


 

ความคิดเห็น


bottom of page