How the language industry is dealing with AI and large language models.
Ever-shifting language
In March 2023 the Oxford English Dictionary released an update. This included the addition to its pages of over 700 new words and phrases and meant terms like ‘deepfake’, ‘chonky’ and ‘groomzilla’ all became a part of the recognized English lexicon. The update also focused on World Englishes and in particular the revising of Māori, Scottish and Indian pronunciation to reflect today’s usage.
In 2021 the Leibniz Institute for the German Language announced it had gathered more than 1,200 new words all linked to the coronavirus pandemic. Between 2019 and 2022 French dictionaries added at least 500 new words, many of them created in response to the climate emergency and pandemic. Japanese also recorded new words in 2022, a majority also related to current events like the war in Ukraine and the climate. In recognition of its growth and increasing cultural influence, in 2021, the United Nations declared that 7th July would become World Kiswahili Language Day.
But dictionaries and organizations are just playing catch-up – these changes have already happened by the time they get official recognition.
Language is always on the move. The words we use and how we use them are in constant flux and languages evolve quickly in relation to cultural shifts and global events. Although we often talk about languages ‘dying’ or ‘thriving’, the reality is that they fall out of use or become more widely spoken or are rediscovered because of how we humans behave and change.
The way humans use and develop language is unique. Our ability to communicate and express ourselves through the formation of words, both in oral and written format, makes us stand out from other animal life forms on earth. Our language shapes and is shaped by our wonderfully diverse and rich cultures and reflects the human condition in all its complexity.
How is it possible then, that machines can grasp this sophisticated, intricate and ever-changing form of communication and use it as we do? How can computers ever begin to understand enough about how our languages work to translate between them?
Dazzled by ChatGPT
In our last blog post we looked at recent developments in generative Artificial Intelligence (GenAI) particularly in regard to language and how ChatGPT has taken the world by storm. Although this type of technology has been progressing steadily over a number of years it has been the online chatbot developed by OpenAI that has captured the wider imagination. Suddenly everyone is talking about how dazzled they are by ChatGPT’s impressive output and how large language models (LLMs) like this are going to revolutionize business communication and the creation of language in general.
ChatGPT is even being trumpeted as better than Google translate and as the solution to the age-old problem of translating between languages.
So, just how good a translator is GenAI and should professional linguists be worried?
Before we get into that let’s take a quick look at how technology has been used in the language business for many years and how, as an industry, we have a habit of making new translation tech work for us.
Translators quickly adopt technology
Since ancient times, the value of transferring meaning from one language to another has been keenly recognized and the quest to do this as efficiently as possible has been ongoing. Easing the burden on the human translator or interpreter has long been an objective in the world of translation. Could we even argue that the Rosetta Stone, the ancient slab inscribed with 3 languages and dating to 196 BC, was an early version of translation technology?
Certainly, from the 1950s onwards ever since IBM’s computer translation of 60 Russian sentences into English, the race to develop a machine to replace humans has been a hot topic. From rule-based machine translation to statistical MT and more recently neural MT, research has been active and the progress steady.
And what’s more, translators and translation agencies have always been quick to adopt the help that was there.
The digital age has meant rapid advances in technology, and translators have adapted to using initially, electronic dictionaries and glossaries, then online and cloud-based translation management systems and today, machine translation powered by neural networks. These technologies have helped streamline and quicken the processes involved and bring value for money to customers
CAT, TMS and MT
Today the tools translators have at their disposal are already cutting-edge.
CAT or computer-aided translation tools offer assistance in several ways. They store previously translated text and auto-complete segments of the target text if it is a complete match. They contain project specific terminology bases, offer additional information from previous translators or project managers, as well as providing a single translation environment for various file formats. In other words, they go a long way towards making the job less disjointed and more systematic.
A translation management system (TMS) connects all the teams involved in a language project; account managers, vendors, project coordinators and translators can all access the relevant information on a TMS. A TMS takes the project from the quote stage to the delivery of the final translation and can offer the possibility of managing projects at scale for different language pairs and markets.
Machine translation is now being widely used in the language industry. The arrival of neural network technology from the mid-2000s meant that machine translation systems became more accurate and over the last decade MT has evolved into a viable form of professional translation for certain types of text and project. The premise of machine translation is that it is able to produce a translation without input from a human linguist.
Technology is so closely tied to the language industry that many companies also develop their own tools, depending on their fields of expertise. t’works has created its own specific technology tools for terminology management, SAP translation and as connectors for CMS systems, all designed to make our customers’ projects run as efficiently and as seamlessly as possible.
The language industry therefore has a brilliant track record of introducing technology into its workflows. It has always been ready to integrate systems that will improve productivity and quality – it embraces disruption with open arms!
GenAI and translation
Leaving aside the concerns around data privacy and security which we touched on in our last blog, let’s look at how good a translator GenAI like ChatGPT is, and if its current limitations, mainly the ‘hallucination’ and inherent bias problems, prevent it from being a viable translation option.
Tests from varied reliable sources have arrived at similar conclusions. The language technology experts at CSA Research have looked at the question in detail. They conclude that ChatGPT produces ‘strikingly good results’ when the languages involved are served by large amounts of data for training (high resource) but that low-resource languages fair less well. They also found that ChatGPT was stronger than current machine translation for tasks involving ‘noisy’ data, (that is to say text that contains perhaps spelling errors or lots of colloquial and unusual language) and that it dealt better with defining context. Being able to use prompts to create more elaborate instructions for the system to work with was also seen as a plus point.
The language industry commentator Slator, looked at recent studies by Tencent, Intento and Microsoft and reported that all three rated ChatGPT as ‘competitive’ for translation involving high resource languages. Language consultants Nimdzi found generative AI language models (like ChatGPT) to be ‘more fluent but less accurate and unpredictable’ in comparison with neural MT. However Nimdzi also concluded that the potential applications of these AI models were much greater than traditional MT, particularly in areas such as, for example, transcreation or detecting gender bias.
These assessments are helpful but we should remember that they are largely comparing generative AI’s translation ability to that of current machine translation, which itself is far from perfect. In all cases, both AI language models and MT still have significant imperfections and a very long way to go before they can replace humans. To quote Arle Lommel from CSA Research, the quality of ChatGPT’s translation, ‘…depends on the language, on the subject matter, and sometimes on the phase of the moon or some other imponderable factor.’
Customer benefits
It would be fair to say that ChatGPT is keeping the language industry on its toes. It is yet another disruptor in a sector that has been subject to advancing technology throughout its history. For the moment its translation abilities are good but not good enough to trust it with translations that matter. If you need to translate important business texts, literature, targeted marketing copy or similar, human expertise is crucial. ChatGPT can be a useful tool for tasks such as getting a quick understanding of a text in another language or for generating content ideas, but as yet its flaws are too significant to let it take on important translation work unsupervised.
How big a role it will play is as yet uncertain. What we do know is that generative AI is changing the technology landscape and ignoring it is not an option. For now we hope the doom sayers are mistaken and that this incredible technology will have a positive impact on the world of language and the people who work in it. For those of us who play with words for a living, well, we’ll keep an open mind and continue to welcome technology tools when they bring direct benefits for our customers.