Lost In Translation

Google Translate has become the ubiquitous service allowing people to communicate freely across language divisions for the first time in history. This last fall Google announced a major update to the Google Translate backend which took advantage of Google’s growing understanding of what we are calling neural networks to provide a more dynamic system which would slowly grow to handle more depth and nuance to the sentences being translated rather than simply doing a “brute force” translation as anyone would with a translation dictionary.

As they started using the new system, the engineers were looking for a way to scale to the 103 supported languages as quickly as possible. They decided to add a “token” to each phrase being translated. This “token” wouldn’t be a word in any one language, but it would be a way for the computer to represent a concept which was used in many languages without having to use any of the other languages.

The engineers were pleased that this token allowed them to translate freely from one language to another, even if no instructions had been given linking the two languages. When they peered closer into how the algorithm was solving the translation, they found that it was most likely forming its own representation for concepts separate from the human languages. In effect, the algorithm was forming its own language, which the engineers have dubbed “interlingua”.

Is this super cool? Yes. Is it super confusing? Probably. Fortunately this post does a fantastic job of explaining whats going on with incredibly relatable analogies. One person read the Google post and made a bunch of very understandable mistakes, leading him to describe what Google Translate was achieving in very sentient terms.

“Google Translate invented its own language to help it translate more effectively.
What’s more, nobody told it to. It didn’t develop a language (or interlingua, as Google call it) because it was coded to. It developed a new language because the software determined over time that this was the most efficient way to solve the problem of translation.”

All this blogger has done has been to assign intent to the code which simply was not there. This innocent mistake I think highlights a very difficult issue we face in the next century. As computers and their software continue to develop, it is going to be extraordinarily easy for us to assign many of the properties of sentience to them. As we move forward, it is important for us to keep perspective that the concept of curiosity is something we have not even begun to successfully model in our machines. Our innate desire to assign such human traits to what amounts to the processing of some code could easily play out in some bizarre ways in our current legal and political atmosphere.