CASP is a global competition held biannually since 1994 to evaluate the progress of computational predictions of protein structures and is incredibly competitive. Finishing in the top three in the main category and first in another is akin to medaling at the Olympics in this particular field.
“It is great for us, a small academic group with very limited resources, to stand in the forefront together with an industry giant such as Google’s DeepMind to solve one of the most important and challenging scientific problems in basic life-science research and human health,” Cheng said.
MULTICOM was the among the first to utilize deep learning principles to computationally predict the structure of protein sequences. These principles are the same ones that allow machines to recognize speech (i.e. Apple’s Siri, Amazon Echo, etc.) and allow them to think on a level that enables them to beat grandmasters at chess and trivia.
In Cheng’s case, he utilizes them to spot correlations in massive sets of amino acids and compares them to the spatial distances between amino acids in order to compute their likely folding pattern. This helps reconstruct 3D shapes for the multitude of unknown protein structures. Since MULTICOM’s development, deep learning has become the standard methodology in this field.
“Without my current and past students’ hard work and the support from the National Institutes of Health (NIH) and National Science Foundation (NSF), we could not have made such a big progress in the last several years,” Cheng explained. “I am pleased to see the deep learning that we helped introduce into the field in 2012 has become the key method for protein structure prediction. Now, the technology (deep learning) and specific direction (residue-residue distance prediction) to solve this fundamental protein folding problem in the foreseeable future has become very clear after the successful demonstration by MULTICOM, AlphaFold and other groups in CASP13.
“This great achievement is the culmination of the many years of effort by many people in the protein folding field, particularly, the CASP community. The community needs to continue to develop better deep learning methods down the road until the protein folding problem is completely solved.”