In a weirdly positive moment for Facebook, the company’s AI division revealed the first multilingual machine translation (MMT) model that doesn’t center on English. The M2M-100 can directly translate 100 languages with each other without needing to translate to English, also known as zero-shot translation. While Google Translate supports direct translation like this on the user’s end, it has done so through its own middle language and by matching languages with English. The M2M-100 is not in production yet, but the model is a surprising step towards decolonized technology.
The M2M-100 — The system was trained on 7.5 billion sentence pairs from 100 languages. Some languages had more sentence pairs to work with, so efficacy is still rooted in cultures with larger available datasets to pull content from.
“The major challenge is really, how do we take the translation systems we have, and then actually meet the demand of people around the world,” Angela Fan, a research associate at Facebook AI, told Engadget before noting two-thirds of Facebook posts are not in English. “So you are translating into all of the languages and across all of the directions that people actually want [but] the existing translation systems rely heavily on English-only data.”
Fourteen groups were also created based on linguistic, regional, and cultural similarities to better improve translations around the world. When you know one Cyrillic language like Russian, for example, it's easier to translate another with fewer datasets to pull from, like Belarusian. The model is open-source, but we’re currently relying on Facebook’s own tests showing that the M2M-100 significantly outperforms other models on both academic standards and human testers’ lingual knowledge.
Lol, jk, stay suspicious of Facebook — It’s no secret that everyone in tech is trying to stake their claim in the digital colonization of Africa. In a continent where an individual country might have dozens if not hundreds of languages, the race to bridge communication gaps is heating up.
Google has its own AI lab in Ghana working on African language inclusion for Translate, but it still struggles with major ones like Yorùbá — in no small part due to English models’ flubbing diacritical marks. Fan also told Engadget that Facebook’s M2M-100 can handle Swahili and Afrikaans well, but the team is working on better isiZulu translations and other languages that are difficult to research. Reminder, this a company that trains AI that lets hate speech slip through the cracks and thinks onions are sexy. So we're going to keep a close eye on its translations and motivations for a while yet.