EECS | EECS Research | Research

Mizzou team shines at computational protein prediction competition

January 09, 2019

A 3D visualization of massive sets of amino acids. It looks like a single long ribbon tangled in distinct clusters, and each cluster is further distinguished by a different color.

MULTICOM utilizes deep learning computational technology to spot correlations in massive sets of amino acids and compares them to the spatial distances between them in order to compute their likely folding pattern. Photo courtesy of Jianlin Cheng.

Accurately predicting how protein sequences will fold into 3D structures is key to determining their biological function and essential in areas such as protein design, protein engineering, drug design, disease research, and precision medicine. MU Engineering William and Nancy Thompson Distinguished Professor of Electrical Engineering and Computer Science Jianlin Cheng and his MULTICOM team are at the forefront of this nascent field, with the accolades to match.

Cheng and the team — currently, graduate students Jie Hou and Tianqi Wu —  finished third in protein tertiary structure modeling behind AlphaFold — powered by Google’s DeepMind AI — and the University of Michigan at the recent 13th edition of the Critical Assessment of Techniques for Protein Structure Prediction (CASP13) competition. MULTICOM also finished first in the category of predicting the accuracy of protein structural models.

Jianlin Cheng Portrait

CASP is a global competition held biannually since 1994 to evaluate the progress of computational predictions of protein structures and is incredibly competitive. Finishing in the top three in the main category and first in another is akin to medaling at the Olympics in this particular field.

“It is great for us, a small academic group with very limited resources, to stand in the forefront together with an industry giant such as Google’s DeepMind to solve one of the most important and challenging scientific problems in basic life-science research and human health,” Cheng said.

MULTICOM was the among the first to utilize deep learning principles to computationally predict the structure of protein sequences. These principles are the same ones that allow machines to recognize speech (i.e. Apple’s Siri, Amazon Echo, etc.) and allow them to think on a level that enables them to beat grandmasters at chess and trivia.

In Cheng’s case, he utilizes them to spot correlations in massive sets of amino acids and compares them to the spatial distances between amino acids in order to compute their likely folding pattern. This helps reconstruct 3D shapes for the multitude of unknown protein structures. Since MULTICOM’s development, deep learning has become the standard methodology in this field.

“Without my current and past students’ hard work and the support from the National Institutes of Health (NIH) and National Science Foundation (NSF), we could not have made such a big progress in the last several years,” Cheng explained. “I am pleased to see the deep learning that we helped introduce into the field in 2012 has become the key method for protein structure prediction. Now, the technology (deep learning) and specific direction (residue-residue distance prediction) to solve this fundamental protein folding problem in the foreseeable future has become very clear after the successful demonstration by MULTICOM, AlphaFold and other groups in CASP13.

“This great achievement is the culmination of the many years of effort by many people in the protein folding field, particularly, the CASP community. The community needs to continue to develop better deep learning methods down the road until the protein folding problem is completely solved.”