Why scientists are so hyped about DeepMind's protein-folding AI

Deep learning just solved a 50-year research problem.

An artificial intelligence program from DeepMind has cracked the code to a problem scientists have been stewing on for nearly 50 years. In what’s said to be a game-changing breakthrough, the Google spinoff’s AlphaFold system demonstrated the ability to predict with great accuracy the 3D structure of proteins based on amino acid sequences. "Proteins" as in those compounds essential for life. That’s no small feat; as the team notes, when Nobel Prize-winning biochemist Christian Anfinsen first hypothesized that such a thing could be done during his 1972 acceptance speech, he estimated the number of configurations for the average protein to be in the order of 10 to the 300th power. That is a number with 300 zeros.

The solution presented by DeepMind has transformative potential for medicine and biological research, achieving in days what would normally take years (and millions of dollars’ worth) of hard work.

Two examples of AlphaFold's protein targets are shown above (blue) compared to experimental results (green).DeepMind

Not playing games anymore — Scientists’ praises of AlphaFold have been nothing short of hyperbolic. The system was demonstrated during CASP14, this year’s (virtual) meeting of the biennial, community-run Critical Assessment of protein Structure Prediction challenge. It’s AlphaFold’s second time participating.

“It’s a game changer,” Max Planck evolutionary biologist Andrei Lupas, one of the CASP judges, told Nature. “This will change medicine. It will change research. It will change bioengineering. It will change everything.”

AlphaFold boasts a median accuracy score of 92.4 GDT (the “Global Distance Test” metric which CASP uses, from 0-100) in predicting the outcome of protein folding. When it misses the mark, the error on average works out to be about the width of an atom (!!). Professor Ewan Birney, Deputy Director-General of the European Molecular Biology Laboratory, said in a statement that he “nearly fell off [his] chair” upon seeing the results.

It feels like only yesterday that DeepMind’s AI systems were busying themselves with the art of undermining chess grandmasters’ life achievements and mastering the ancient, intricate game of Go. Now, the group is tackling the fundamental components of life. They grow up so fast.

AlphaFold boasts a median accuracy score of 92.4 GDT in predicting protein folding. That's 92.4 out of 100... not too shabby. DeepMind

The nitty-gritty — AlphaFold was trained on approximately 170,000 protein structures from a publicly available data bank, eclipsing the previous version (and the competition) in accuracy thanks to its new deep learning architecture. It’s a complex solution to match the complex problem of protein folding. As the AlphaFold team explains:

A folded protein can be thought of as a “spatial graph,” where residues are the nodes and edges connect the residues in close proximity. This graph is important for understanding the physical interactions within proteins, as well as their evolutionary history. For the latest AlphaFold, used at CASP14, we created an attention-based neural network system, trained end-to-end, that attempts to interpret the structure of this graph, while reasoning over the implicit graph that it’s building. It uses evolutionarily related sequences, multiple sequence alignment (MSA), and a representation of amino acid residue pairs to refine this graph. By iterating this process, the system develops strong predictions of the underlying physical structure of the protein and is able to determine highly-accurate structures in a matter of days.

Scientists currently rely on experimental, exhaustive techniques to get these types of results, and understanding the 3D structure of proteins is critical to disease research and drug development, among other things. That we are currently riding out a deadly pandemic only adds to the weight of AlphaFold’s significance. The DeepMind team notes that this year it applied its system to the SARS-CoV-2 virus that’s behind COVID-19 and can now say its predictions demonstrated “a high degree of accuracy.”

For the time being, DeepMind will work with a small number of groups to assist research on malaria, sleeping sickness, and the parasitic disease, leishmaniasis, according to the Guardian. AlphaFold’s capabilities could also find use in the search for enzymes that can break down industrial waste. It’s really just the beginning.