The protein composition of cells depends on the cells’ function and current condition. Mass spectrometry (MS) can determine the identity and quantity of the proteins found in a sample. However, the data analysis of this method is time- and resource-intensive. Researchers at the Max Planck Institute of Biochemistry (MPIB) have collaborated with data science specialists from Verily in the USA to develop a machine learning approach – continuously self-improving algorithms – to facilitate the analysis of mass spectrometry data. Their results, which simplify MS applications and also led to the discovery of new chemical patterns in proteins, are published in the journal Nature Methods.
The final trip of many old cars leads them to a junk yard where they are deconstructed into potentially salvageable parts. By looking at the entirety of car components, a trained and experienced employee of the junk yard may be able to reconstruct the identity of the scrapped car. Mass spectrometers (MS), which identify and quantify proteins in a sample, are like molecular junk yards.
The proteins are first broken into smaller fragments – the peptides. The information on the identity and the abundance of the peptides can be recorded as mass spectrometry spectra. To reconstruct the information on the proteins in the analyzed sample, the characteristics of these spectra are then compared to previously recorded libraries – a process that requires a lot of computational power.
Machine learning supports data analysis
In cooperation with Verily, the life sciences company of Alphabet, researchers at the MPIB have now developed the model DeepMass:Prism to facilitate the interpretation of mass spectrometry spectra., They used machine learning to train algorithms to “translate” proteins into MS spectra. The translation of these abstract data is challenging for artificial intelligence and best performed by “deep learning” algorithms.
Similar deep learning approaches are used in the automatic translation of languages. But rather than translating from English to German or vice versa, DeepMass:Prism is trained to translate between proteins and the spectra that are usually generated in MS analysis.
“The key to success in this project was the fusion of our expertise in mass spectrometry with Verily’s expertise in deep learning, particularly in the fields of biology and life sciences.”, says Jürgen Cox, independent group leader at the MPIB.
Their program DeepMass:Prism was trained with more than 60 million peptide spectra from publicly accessible data bases. The program recognizes patterns from the training spectra and applies them to the analysis of new samples.
The computational biologist Cox highlights that DeepMass:Prism improves different applications of mass spectrometry. One potential use of MS is the characterization of samples whose composition is entirely unknown. The new algorithms can increase the number of peptides that are identified in this approach.
Alternatively, large groups of samples with a similar general composition can be compared regarding individual differences in the protein quantity. For instance, blood samples from patients generally have a similar protein composition, but it is important to detect altered protein levels to diagnose diseases.
“This is where our DeepMass:Prism has made the greatest strides”, says Cox. “Rather than experimentally determining the reference libraries to which the samples are compared, the model can now predict them – a shortcut that saves a lot of time and resources. “
Finding needles in the peptide haystack
The more than 200 cell types in the human body are characterized by the presence of different proteins, but also by the differences in the abundance of identical proteins. The different quantities of proteins are particularly challenging for MS analyses. Jürgen Cox explains the importance of measuring protein quantities with an automobile analogy: “When you completely disassemble a car, the piles of parts can look quite similar. Therefore, knowing the abundance of certain parts can help in the identification.
When you find 6 cylinders in a pile, you know that it can not be a car with a four-cylinder engine.” Similarly, finding only three tires in a pile shows a potential damage of a car. The same principle can be applied to the analysis of cells or tissues. Diseases can cause certain proteins to be more or less abundant than in healthy control samples.
Many diagnostic procedures rely on the measurement of proteins in patient samples by mass spectrometry. “We need highly accurate MS to discover new biomarkers – indicators of disease. Sometimes, even a small variation of a certain biomarker can signal disease progression, therefore the prediction must be precise and reproducible.”, says Peter Cimermancic, Senior Scientist at Verily. With DeepMass:Prism, the researchers strongly improved the correlation between the predicted spectra and the actual measured spectra. He is optimistic that the model will lead to the development of new diagnostic tools.
Even though DeepMass:Prism was not trained with chemical knowledge, it discovered new chemical rules that determine how the peptides break into smaller fragments. “The previous library-based approach could only reproduce what it already knew. DeepMass:Prism is able to generate new knowledge by combining information and drawing its own conclusions. This is a very exciting finding”, says Cox, “it’s like a junk yard employee who understands where a certain part of the car is installed, even though he has never seen this type of car before.
The predictions by DeepMass:Prism have led to the identification of a new kind of interaction within proteins. We believe that this discovery is only the beginning of what deep learning can do for research in life sciences.” DeepMass:Prism will be available for download on Google cloud. [CW]
Prof. Jürgen Cox, PhD
Computational Systems Biochemistry
Max Planck Institute of Biochemistry
Am Klopferspitz 18
S. Tiwary*, R. Levy*, P. Gutenbrunner*, F.S. Soto, K. Palaniappan, L. Deming, M. Berndl, A. Brant, P. Cimermancic and J. Cox: High-quality MS/MS spectrum prediction for data-dependent and data-independent acquisition data analysis. Nature Methods, May 2019 (*equal contributions)
Dr. Christiane Menzfeld | Max-Planck-Institut für Biochemie
Researchers discover a new way in which insulin interacts with its receptor
18.11.2019 | Deutsches Zentrum für Diabetesforschung
Bacterial protein impairs important cellular processes
18.11.2019 | University of Freiburg
An international team of scientists, including three researchers from New Jersey Institute of Technology (NJIT), has shed new light on one of the central mysteries of solar physics: how energy from the Sun is transferred to the star's upper atmosphere, heating it to 1 million degrees Fahrenheit and higher in some regions, temperatures that are vastly hotter than the Sun's surface.
With new images from NJIT's Big Bear Solar Observatory (BBSO), the researchers have revealed in groundbreaking, granular detail what appears to be a likely...
The Fraunhofer Institute for Manufacturing Technology and Advanced Materials IFAM in Dresden has succeeded in using Selective Electron Beam Melting (SEBM) to...
Carbon nanotubes (CNTs) are valuable for a wide variety of applications. Made of graphene sheets rolled into tubes 10,000 times smaller than a human hair, CNTs have an exceptional strength-to-mass ratio and excellent thermal and electrical properties. These features make them ideal for a range of applications, including supercapacitors, interconnects, adhesives, particle trapping and structural color.
New research reveals even more potential for CNTs: as a coating, they can both repel and hold water in place, a useful property for applications like printing,...
If you've ever tried to put several really strong, small cube magnets right next to each other on a magnetic board, you'll know that you just can't do it. What happens is that the magnets always arrange themselves in a column sticking out vertically from the magnetic board. Moreover, it's almost impossible to join several rows of these magnets together to form a flat surface. That's because magnets are dipolar. Equal poles repel each other, with the north pole of one magnet always attaching itself to the south pole of another and vice versa. This explains why they form a column with all the magnets aligned the same way.
Now, scientists at ETH Zurich have managed to create magnetic building blocks in the shape of cubes that - for the first time ever - can be joined together to...
Quantum-based communication and computation technologies promise unprecedented applications, such as unconditionally secure communications, ultra-precise...
15.11.2019 | Event News
15.11.2019 | Event News
05.11.2019 | Event News
18.11.2019 | Life Sciences
18.11.2019 | Life Sciences
18.11.2019 | Life Sciences