The protein composition of cells depends on the cells’ function and current condition. Mass spectrometry (MS) can determine the identity and quantity of the proteins found in a sample. However, the data analysis of this method is time- and resource-intensive. Researchers at the Max Planck Institute of Biochemistry (MPIB) have collaborated with data science specialists from Verily in the USA to develop a machine learning approach – continuously self-improving algorithms – to facilitate the analysis of mass spectrometry data. Their results, which simplify MS applications and also led to the discovery of new chemical patterns in proteins, are published in the journal Nature Methods.
The final trip of many old cars leads them to a junk yard where they are deconstructed into potentially salvageable parts. By looking at the entirety of car components, a trained and experienced employee of the junk yard may be able to reconstruct the identity of the scrapped car. Mass spectrometers (MS), which identify and quantify proteins in a sample, are like molecular junk yards.
The proteins are first broken into smaller fragments – the peptides. The information on the identity and the abundance of the peptides can be recorded as mass spectrometry spectra. To reconstruct the information on the proteins in the analyzed sample, the characteristics of these spectra are then compared to previously recorded libraries – a process that requires a lot of computational power.
Machine learning supports data analysis
In cooperation with Verily, the life sciences company of Alphabet, researchers at the MPIB have now developed the model DeepMass:Prism to facilitate the interpretation of mass spectrometry spectra., They used machine learning to train algorithms to “translate” proteins into MS spectra. The translation of these abstract data is challenging for artificial intelligence and best performed by “deep learning” algorithms.
Similar deep learning approaches are used in the automatic translation of languages. But rather than translating from English to German or vice versa, DeepMass:Prism is trained to translate between proteins and the spectra that are usually generated in MS analysis.
“The key to success in this project was the fusion of our expertise in mass spectrometry with Verily’s expertise in deep learning, particularly in the fields of biology and life sciences.”, says Jürgen Cox, independent group leader at the MPIB.
Their program DeepMass:Prism was trained with more than 60 million peptide spectra from publicly accessible data bases. The program recognizes patterns from the training spectra and applies them to the analysis of new samples.
The computational biologist Cox highlights that DeepMass:Prism improves different applications of mass spectrometry. One potential use of MS is the characterization of samples whose composition is entirely unknown. The new algorithms can increase the number of peptides that are identified in this approach.
Alternatively, large groups of samples with a similar general composition can be compared regarding individual differences in the protein quantity. For instance, blood samples from patients generally have a similar protein composition, but it is important to detect altered protein levels to diagnose diseases.
“This is where our DeepMass:Prism has made the greatest strides”, says Cox. “Rather than experimentally determining the reference libraries to which the samples are compared, the model can now predict them – a shortcut that saves a lot of time and resources. “
Finding needles in the peptide haystack
The more than 200 cell types in the human body are characterized by the presence of different proteins, but also by the differences in the abundance of identical proteins. The different quantities of proteins are particularly challenging for MS analyses. Jürgen Cox explains the importance of measuring protein quantities with an automobile analogy: “When you completely disassemble a car, the piles of parts can look quite similar. Therefore, knowing the abundance of certain parts can help in the identification.
When you find 6 cylinders in a pile, you know that it can not be a car with a four-cylinder engine.” Similarly, finding only three tires in a pile shows a potential damage of a car. The same principle can be applied to the analysis of cells or tissues. Diseases can cause certain proteins to be more or less abundant than in healthy control samples.
Many diagnostic procedures rely on the measurement of proteins in patient samples by mass spectrometry. “We need highly accurate MS to discover new biomarkers – indicators of disease. Sometimes, even a small variation of a certain biomarker can signal disease progression, therefore the prediction must be precise and reproducible.”, says Peter Cimermancic, Senior Scientist at Verily. With DeepMass:Prism, the researchers strongly improved the correlation between the predicted spectra and the actual measured spectra. He is optimistic that the model will lead to the development of new diagnostic tools.
Even though DeepMass:Prism was not trained with chemical knowledge, it discovered new chemical rules that determine how the peptides break into smaller fragments. “The previous library-based approach could only reproduce what it already knew. DeepMass:Prism is able to generate new knowledge by combining information and drawing its own conclusions. This is a very exciting finding”, says Cox, “it’s like a junk yard employee who understands where a certain part of the car is installed, even though he has never seen this type of car before.
The predictions by DeepMass:Prism have led to the identification of a new kind of interaction within proteins. We believe that this discovery is only the beginning of what deep learning can do for research in life sciences.” DeepMass:Prism will be available for download on Google cloud. [CW]
Prof. Jürgen Cox, PhD
Computational Systems Biochemistry
Max Planck Institute of Biochemistry
Am Klopferspitz 18
S. Tiwary*, R. Levy*, P. Gutenbrunner*, F.S. Soto, K. Palaniappan, L. Deming, M. Berndl, A. Brant, P. Cimermancic and J. Cox: High-quality MS/MS spectrum prediction for data-dependent and data-independent acquisition data analysis. Nature Methods, May 2019 (*equal contributions)
Dr. Christiane Menzfeld | Max-Planck-Institut für Biochemie
If Machines Could Smell ...
19.07.2019 | Fraunhofer-Institut für Produktionstechnik und Automatisierung IPA
Algae-killing viruses spur nutrient recycling in oceans
18.07.2019 | Rutgers University
Adjusting the thermal conductivity of materials is one of the challenges nanoscience is currently facing. Together with colleagues from the Netherlands and Spain, researchers from the University of Basel have shown that the atomic vibrations that determine heat generation in nanowires can be controlled through the arrangement of atoms alone. The scientists will publish the results shortly in the journal Nano Letters.
In the electronics and computer industry, components are becoming ever smaller and more powerful. However, there are problems with the heat generation. It is...
Scientists have visualised the electronic structure in a microelectronic device for the first time, opening up opportunities for finely-tuned high performance electronic devices.
Physicists from the University of Warwick and the University of Washington have developed a technique to measure the energy and momentum of electrons in...
Scientists at the University Würzburg and University Hospital of Würzburg found that megakaryocytes act as “bouncers” and thus modulate bone marrow niche properties and cell migration dynamics. The study was published in July in the Journal “Haematologica”.
Hematopoiesis is the process of forming blood cells, which occurs predominantly in the bone marrow. The bone marrow produces all types of blood cells: red...
For some phenomena in quantum many-body physics several competing theories exist. But which of them describes a quantum phenomenon best? A team of researchers from the Technical University of Munich (TUM) and Harvard University in the United States has now successfully deployed artificial neural networks for image analysis of quantum systems.
Is that a dog or a cat? Such a classification is a prime example of machine learning: artificial neural networks can be trained to analyze images by looking...
An international research group led by scientists from the University of Bayreuth has produced a previously unknown material: Rhenium nitride pernitride. Thanks to combining properties that were previously considered incompatible, it looks set to become highly attractive for technological applications. Indeed, it is a super-hard metallic conductor that can withstand extremely high pressures like a diamond. A process now developed in Bayreuth opens up the possibility of producing rhenium nitride pernitride and other technologically interesting materials in sufficiently large quantity for their properties characterisation. The new findings are presented in "Nature Communications".
The possibility of finding a compound that was metallically conductive, super-hard, and ultra-incompressible was long considered unlikely in science. It was...
24.06.2019 | Event News
29.04.2019 | Event News
17.04.2019 | Event News
19.07.2019 | Physics and Astronomy
19.07.2019 | Physics and Astronomy
19.07.2019 | Earth Sciences