Forum for Science, Industry and Business

Sponsored by:     3M 
Search our Site:


Seeking structure with metagenome sequences


Metagenomics database helps fill in 10 percent of previously unknown protein structures

For proteins, appearance matters. These important molecules largely form a cell's structures and carry out its functions: proteins control growth and influence mobility, serve as catalysts, and transport or store other molecules. Comprised of long amino acid chains, the one-dimensional amino acid sequence may seem meaningless on paper. Yet when viewed in three dimensions, researchers can see what a protein's structure is and how a protein's structure, and particularly the way it folds, determines its functions.

In a study published Jan. 20, 2017 in Science, a team led by University of Washington researchers and including DOE JGI researchers reports that structural models have been generated for 12 percent of the protein families that had previously had no structural information available. This is a brief overview of the work. Top: Researchers gathering samples from Great Boiling Spring in Nevada. Left: a snapshot of aligned metagenomic sequences. Each row is a different sequence (the different colors are the different amino acid groups). Each position (or column) is compared to all other positions to detect patterns of co-evolution. Bottom: the strength of the top co-evolving residues is shown as blue dots, these are also shown as colored lines on the structure above. The goal is to make a structure that makes as many of these contacts as possible. Right: a cartoon of the protein structure predicted. The protein domain shown is from Pfam DUF3794, this domain is part of a Spore coat assembly protein SafA.

Image of Great Boiling Spring by Brian Hedlund, UNLV. Protein structure and composite image by Sergey Ovchinnikov, UW

There are close to 15,000 protein families - groups of families that share an evolutionary origin - in the database Pfam. For nearly a third (4,752) of these protein families, there is at least one protein in each family that already has an experimentally determined structure. For another third (4,886) of the protein families, comparative models could be built with some degree of confidence. For the final third (5,211) of the protein families in the database, however, no structural information exists.

In the January 20, 2017 issue of Science, a team led by University of Washington's David Baker in collaboration with researchers at the U.S. Department of Energy Joint Genome Institute (DOE JGI), a DOE Office of Science User Facility, reports that structural models have been generated for 614 or 12 percent of the protein families that had previously had no structural information available.

"That this could be accomplished using computational modeling methods was not at all apparent 5 years ago," the team noted in their paper. This accomplishment was made possible through a collaboration in which the Baker lab's protein structure prediction server Rosetta analyzed the metagenomic sequences publicly available on the Integrated Microbial Genomes (IMG) system run by the DOE JGI.

"A large number of protein families (in Pfam) have low number of sequences," said study first author Sergey Ovchinnikov, a graduate student in the Baker lab. "This resulted in two consequences: 1) nobody cared about these families (since they were small); and, 2) co-evolution methods could not be applied to study them. With metagenomics, we found that some of these neglected families with only a handful of sequences so far, can now become as large as some of the most studied ones, when metagenomics data are taken into account! Moreover, we can offer a 3D model of a representative sequence from the family. We hope this will spark interest in some of these families."

Armed with genome sequences, researchers like Baker have been able to identify sets of amino acids that evolve simultaneously, even though they are nowhere near each other on the unfolded chain. Such events suggests these amino acids are neighbors in the folded protein, offering researchers hints as to the protein's structure. Structural proximity can suggest a functional relationship and thus natural selection, acting on the function, can favor not just one amino acid but all that are in the set.

Nikos Kyrpides, DOE JGI Prokaryote Super Program head, said the collaboration between the Baker lab and the DOE JGI allowed the team to come up with a powerful way of predicting structures and structural alignments. "Such efforts, were previously restricted on protein families generated from sequences found on the isolate genome only. These genomes comprise about 200 million sequences. As expected, when we added on those our metagenomics data, harnessing the 5 billion assembled metagenome sequences available on our IMG/M database, we were able to dramatically increase the coverage of many of the known protein families. Efforts like this one heavily depend on the availability of assembled metagenomics sequences, which is an advantage the DOE JGI brings to the table with our high quality assemblies."

Kyrpides added that this work, which also involved DOE JGI researchers Neha Varghese and George Pavlopoulos, embodies another kind of collaboration that he'd like to see encouraged. "People came to us because we are maintaining the largest integration of assembled metagenomes. The application of such tools on our data provides a great example of how the larger community can utilize JGI resources for discovery. We would very much like to see more success stories like this one through a new Data Science call between the JGI and the National Energy Research Scientific Computing Center (NERSC)."

The JGI-NERSC Microbiome Data Science call will enable users to perform state-of-the-art computational genomics and metagenomics research and help them translate sequence information, generated by the DOE JGI or elsewhere, into biological discovery. This proposal call builds upon the success of "Facilities Integrating Collaborations for User Science" (FICUS) initiative, established to encourage and enable researchers to more easily integrate the expertise and capabilities of multiple national user facilities into their research. Applications for JGI-NERSC collaborative science call are currently being accepted until March 1, 2017. For more information about the call, go to:


The U.S. Department of Energy Joint Genome Institute, a DOE Office of Science User Facility at Lawrence Berkeley National Laboratory, is committed to advancing genomics in support of DOE missions related to clean energy generation and environmental characterization and cleanup. DOE JGI, headquartered in Walnut Creek, Calif., provides integrated high-throughput sequencing and computational analysis that enable systems-based scientific approaches to these challenges. Follow @doe_jgi on Twitter.

DOE's Office of Science is the largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, please visit

Massie Ballon | EurekAlert!

Further reports about: Genome amino acid metagenome metagenome sequences sequences

More articles from Life Sciences:

nachricht World’s Largest Study on Allergic Rhinitis Reveals new Risk Genes
17.07.2018 | Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt

nachricht Plant mothers talk to their embryos via the hormone auxin
17.07.2018 | Institute of Science and Technology Austria

All articles from Life Sciences >>>

The most recent press releases about innovation >>>

Die letzten 5 Focus-News des innovations-reports im Überblick:

Im Focus: First evidence on the source of extragalactic particles

For the first time ever, scientists have determined the cosmic origin of highest-energy neutrinos. A research group led by IceCube scientist Elisa Resconi, spokesperson of the Collaborative Research Center SFB1258 at the Technical University of Munich (TUM), provides an important piece of evidence that the particles detected by the IceCube neutrino telescope at the South Pole originate from a galaxy four billion light-years away from Earth.

To rule out other origins with certainty, the team led by neutrino physicist Elisa Resconi from the Technical University of Munich and multi-wavelength...

Im Focus: Magnetic vortices: Two independent magnetic skyrmion phases discovered in a single material

For the first time a team of researchers have discovered two different phases of magnetic skyrmions in a single material. Physicists of the Technical Universities of Munich and Dresden and the University of Cologne can now better study and understand the properties of these magnetic structures, which are important for both basic research and applications.

Whirlpools are an everyday experience in a bath tub: When the water is drained a circular vortex is formed. Typically, such whirls are rather stable. Similar...

Im Focus: Breaking the bond: To take part or not?

Physicists working with Roland Wester at the University of Innsbruck have investigated if and how chemical reactions can be influenced by targeted vibrational excitation of the reactants. They were able to demonstrate that excitation with a laser beam does not affect the efficiency of a chemical exchange reaction and that the excited molecular group acts only as a spectator in the reaction.

A frequently used reaction in organic chemistry is nucleophilic substitution. It plays, for example, an important role in in the synthesis of new chemical...

Im Focus: New 2D Spectroscopy Methods

Optical spectroscopy allows investigating the energy structure and dynamic properties of complex quantum systems. Researchers from the University of Würzburg present two new approaches of coherent two-dimensional spectroscopy.

"Put an excitation into the system and observe how it evolves." According to physicist Professor Tobias Brixner, this is the credo of optical spectroscopy....

Im Focus: Chemical reactions in the light of ultrashort X-ray pulses from free-electron lasers

Ultra-short, high-intensity X-ray flashes open the door to the foundations of chemical reactions. Free-electron lasers generate these kinds of pulses, but there is a catch: the pulses vary in duration and energy. An international research team has now presented a solution: Using a ring of 16 detectors and a circularly polarized laser beam, they can determine both factors with attosecond accuracy.

Free-electron lasers (FELs) generate extremely short and intense X-ray flashes. Researchers can use these flashes to resolve structures with diameters on the...

All Focus news of the innovation-report >>>



Industry & Economy
Event News

Leading experts in Diabetes, Metabolism and Biomedical Engineering discuss Precision Medicine

13.07.2018 | Event News

Conference on Laser Polishing – LaP: Fine Tuning for Surfaces

12.07.2018 | Event News

11th European Wood-based Panel Symposium 2018: Meeting point for the wood-based materials industry

03.07.2018 | Event News

Latest News

Behavior-influencing policies are critical for mass market success of low carbon vehicles

17.07.2018 | Power and Electrical Engineering

Plant mothers talk to their embryos via the hormone auxin

17.07.2018 | Life Sciences

Subaru Telescope helps pinpoint origin of ultra-high energy neutrino

16.07.2018 | Physics and Astronomy

Science & Research
Overview of more VideoLinks >>>