Detecting millions of consistently unidentified spectra in vast tracts of proteomics data is possible with a new algorithm developed at EMBL-EBI
A new algorithm clusters the millions of peptide mass spectra in the PRIDE Archive public database, making it easier to detect millions of consistently unidentified spectra across different datasets. Published in Nature Methods, the new tool is an important step towards fully exploiting data produced in discovery proteomics experiments.
On average, almost three quarters of spectra measured in discovery proteomics experiments remain unidentified, regardless of the quality of the experiment, as they cannot be interpreted by standard sequence-based search engines.
Alternative approaches to improve the rate of identification exist, but are fraught with disadvantages including ambiguous results. In today's study, researchers working on the PRIDE Archive public repository of proteomics data present a large-scale 'spectrum clustering' solution that takes advantage of the growing number of mass spectrometry (MS) datasets to systematically study millions of unidentified spectra.
"MS experiments produce huge amounts of data, but identifying meaningful sequences that could be assigned to specific biological functions can be troublesome," says Johannes Griss, formerly at EMBL-EBI in the UK and now at the Medical University of Vienna, Austria.
"Discovery proteomics is a mature technology, and it's crucial that we are able to exploit the data efficiently."
One of the challenges with these technologies is that a large proportion of the data generated can't be interpreted, as they correspond to peptides that have not yet been observed and are not available in databases. Such spectra could correspond to peptide variants derived from individual generic variation, or to peptides containing post-translational modifications, which are essential for the biological functions of proteins.
"What we have now is an algorithm that shows us patterns, or groups of spectra, that we've consistently missed, and helps us figure out which ones are good enough to pursue," adds Johannes. "It's a valuable tool that helps us unpick what's going on in proteomics, so we can better understand basic biological processes."
The team used the approach to recognise 9 million consistently unidentified spectra, which can make post-translational modifications and peptides containing sequence variants more discoverable. They identified three distinct sets of spectra: those that have been incorrectly identified, those that are not of high enough quality to identify properly, and those that are truly unidentified. They also combined their new approach with other methods to identify roughly 20% of the originally unidentified spectra in the public archive.
"Discovery proteomics is a mature technology, and it's crucial that we are able to exploit the data efficiently - but creating a sensible subset of spectra to start an in-depth analysis of unidentified spectra has been very challenging," says Juan Antonio Vizcaíno, who leads the Proteomics team at EMBL-EBI. "We developed a comparatively lightweight computational approach that makes it much easier to detect sequences that have been incorrectly identified, or consistently observed but not identified. These ready-to-use collections of commonly unidentified spectra are a resource for the community, so that we can all pool our efforts to find lasting solutions for proteomics research."
The new algorithm will be used to improve quality control in the PRIDE Archive. The complete spectrum clustering results are available through the PRIDE Cluster resource, which aims to simplify further investigation into unidentified spectra.
Source article: Griss J., et al. (2016). Recognizing millions of consistently unidentified spectra across hundreds of shotgun proteomics datasets. Nature Methods (in press). DOI: 10.1038/nmeth.3902
Mary Todd Bergman | EurekAlert!
Complementing conventional antibiotics
24.05.2018 | Goethe-Universität Frankfurt am Main
Building a brain, cell by cell: Researchers make a mini neuron network (of two)
23.05.2018 | Institute of Industrial Science, The University of Tokyo
A research team led by physicists at the Technical University of Munich (TUM) has developed molecular nanoswitches that can be toggled between two structurally different states using an applied voltage. They can serve as the basis for a pioneering class of devices that could replace silicon-based components with organic molecules.
The development of new electronic technologies drives the incessant reduction of functional component sizes. In the context of an international collaborative...
At the LASYS 2018, from June 5th to 7th, the Laser Zentrum Hannover e.V. (LZH) will be showcasing processes for the laser material processing of tomorrow in hall 4 at stand 4E75. With blown bomb shells the LZH will present first results of a research project on civil security.
At this year's LASYS, the LZH will exhibit light-based processes such as cutting, welding, ablation and structuring as well as additive manufacturing for...
There are videos on the internet that can make one marvel at technology. For example, a smartphone is casually bent around the arm or a thin-film display is rolled in all directions and with almost every diameter. From the user's point of view, this looks fantastic. From a professional point of view, however, the question arises: Is that already possible?
At Display Week 2018, scientists from the Fraunhofer Institute for Applied Polymer Research IAP will be demonstrating today’s technological possibilities and...
So-called quantum many-body scars allow quantum systems to stay out of equilibrium much longer, explaining experiment | Study published in Nature Physics
Recently, researchers from Harvard and MIT succeeded in trapping a record 53 atoms and individually controlling their quantum state, realizing what is called a...
The historic first detection of gravitational waves from colliding black holes far outside our galaxy opened a new window to understanding the universe. A...
02.05.2018 | Event News
13.04.2018 | Event News
12.04.2018 | Event News
24.05.2018 | Ecology, The Environment and Conservation
24.05.2018 | Medical Engineering
24.05.2018 | Physics and Astronomy