As my first exercise at the Whitaker Lab, Kirstie, immediately detecting my social frailty, has asked me to introduce myself to some strangers at the Alan Turing Institute and ask them about their work.
Hieu Hoang looks after Moses a major open source implementation of machine translation. Machine translation means translating natural languages (where natural languages are English, Yoruba, Cantonese etc) into one another via machine, a la Google Translate. Machine translation starts by finding large corpora of texts that already exist in multiple languages to learn from. This training data may come from sources as diverse as EU law, UN documents, Linux manuals and film subtitles. It can be hard to find corpora comparing obscure languages to each other directly, so in these cases a language might be translated into English and retranslated into the target language, which apparently doesn’t go as badly as you think. Asked what might change in his field, Hieu said that neural networks are gathering momentum as a tool for finding the best translations, now that there is more data and better hardware to support them.
When I described my assignment (talking to people) to Darren he asked me if I know what a test to distinguish a machine from a human via conversation is called. Touché Darren.
Darren manages the data-centric engineering programme at the Alan Turing Institute. The goal is to apply the cutting edge capabilities of data science to engineering. Darren notes that huge structures like cities, buildings and transport networks create masses of data. That data can be used to improve these systems, making them more safe, reliable and efficient. As he points out, autonomous transport is an area where we are already seeing this change. A large part of his job is connecting people from different disciplines. He says that as a new organization, the Alan Turing Institute attracts a lot of novel challenges, and he enjoys that.
Understanding the changes in language throughout history is a problem that has traditionally been tackled qualitatively, and there is good reason for that. Meaning is shaped by the knowledge that the reader possesses of the world around us (among other things), so it is a great challenge for the computer which must make sense of historical data. Taking advantage of the expertise of traditional scholars and advances in data science Barbara intends to answer these questions quantitatively and on a much larger scale. A program can follow a word through scores of historical texts trying to identify the changes in context that indicate a change in meaning. Her work is concerned with both the remote and recent past.
A huge thank you to Hieu, Darren and Barbara for chatting to me