Culture

Tired of your typos, Bing expands spelling corrections to over 100 languages

100+

The number of languages Microsoft can fix spelling mistakes in.

Shutterstock

Microsoft wants to make its search abilities a lot more inclusive and efficient for its users. The company has announced its latest measures to improve spelling correction for more than 100 languages.

The company says that 15 percent of its search queries have spelling mistakes, which directly affect the quality of the results for users. The new technology, Speller100, is an attempt to fix this language hiccup with an impressive amalgam of artificial intelligence, zero-shot learning, and linguistic research and theory.

What is zero-shot learning — In order to make spelling correction sophisticated, Microsoft deployed a complex problem-solving setup known as zero-shot learning. In this kind of solution approach, the machine is expected to predict correct categories for samples while being trained on swaths of textual data. Zero-shot learning is especially useful in language training, according to Microsoft, as it "allows a model to accurately learn and correct spelling without any additional language-specific labeled training data."

"Imagine someone had taught you how to spell in English and you automatically learned to also spell in German, Dutch, Afrikaans, Scots, and Luxembourgish," the company explains. "That is what zero-shot learning enables, and it is a key component in Speller100 that allows us to expand to languages with very little to no data."

Handling mistakes like a pro — In Microsoft's iteration of spelling correction, users are basically working with a machine that handles errors as a sequence-to-sequence problem. Microsoft says that it was inspired by Facebook's AI Research problem and decided to apply complex problem solving to its spelling correction machine. Inclusive, nuanced, and surprisingly educational.

What makes Microsoft's recent changes even more interesting is that it takes a dive into language history. The problem-solving approach involves understanding the common language ancestor for multiple languages and how their orthography — the way words are spelled and other rules of a language — connect one language to the other. For example, English has orthographic similarities with Germanic languages. Two in English is "twee" in Dutch, "zwei" in German, and "zwee" in Luxembourgish. Blood is "bloed" in Dutch and Afrikaans, "Blut" in German, and "Blutt" in Luxembourgish.

You get the drift. It makes spelling correction a lot more convenient — and learning languages a lot more fun.