Forum for Science, Industry and Business

Sponsored by:     3M 
Search our Site:

 

Establishing standard definitions for genome sequences

13.10.2009
In 1996, researchers from major genome sequencing centers around the world convened on the island of Bermuda and defined a finished genome as a gapless sequence with a nucleotide error rate of one or less in 10,000 bases. This effectively set the quality target for the human genome effort and was quickly applied to other genome projects. If a genome sequence didn't meet this stringent criterion, it was simply considered a "draft."

More than a decade later, researchers are finding that with the advent of the latest sequencing technologies the terms "draft" and "finished" are no longer sufficient to describe the varying levels of genome sequence quality being produced. The quality issue is of particular concern for any researcher who wants to use the sequence, in order to know its integrity and reliability.

This is of even greater concern for reference genome sequences, such as those genome projects conducted in support of the U.S. Department of Energy (DOE) missions of bioenergy and environmental clean-up, because they provide the foundational knowledge of the gene content and how these organisms interact with the environment.

As the proverbial "fire hose of data" becomes a Niagara torrent, with conservative estimates of 12,000 draft genomes hitting the public databases by 2012, researchers may be surprised to find that these datasets describe genomes that are not complete. Recognizing the problem, a group of researchers from several sequencing centers, including the DOE Joint Genome Institute (JGI), the Sanger Institute and the Human Microbiome Project (HMP) Jumpstart Consortium sequencing institutes, has proposed a new set of standards that expand upon the so-called "Bermuda standard." In the October 9 issue of the journal Science, they propose four additional categories between "draft" and "finished" status that reflect varying levels of completeness.

"In the past we've been limited to two options, requiring us and the other centers to come up with internal definitions," said DOE JGI metagenomics researcher Patrick Chain at Los Alamos National Laboratory (LANL), first author of the Science paper. "But these are not clear and they're not propagated to the databases to which we submit sequences. So when users try to download genomes they get data of unknown quality with no information, or a complete genome that they assume has been checked for missing-data errors."

Chain said that when he and the other organizers of the Sequencing, Finishing, Analysis in the Future meeting hosted by LANL first gathered in 2005, they were concerned by the varying quality of the new genomes being submitted to public archives . As the meeting organizers all represented major sequencing centers (and smaller groups as well), the genome projects standards group was initiated at LANL, stimulated by these concerns.

The six categories defined by the group include:

"Standard draft," which is the minimum amount of information needed for submission to a public database;
"High quality draft," which is typically generated by large sequencing centers such as DOE JGI, and which has little or no manual review;
"Improved high quality draft," which consists of data reviewed by either people or machines to some extent so most of the genetic data is assembled correctly, but some errors may still be present;
"Annotation-directed improvement," which is a sequenced segment that presents all the information in various gene regions as accurately as possible;
"Noncontiguous finished," which includes sequences that have been reviewed by both people and machines and would be considered complete except for "recalcitrant regions" that are proving problematic;
"Finished," which defines complete sequences that have minimal errors, if any.

DOE JGI's Chris Detter, one of the paper's senior authors, and head of the LANL Genome Science group, said that the definitions provided in the Science paper are fairly flexible because the group wanted the proposed standards to apply regardless of the genome project or sequencing technologies employed.

"My hope is all the major genome centers and advanced genomics groups use the gradations that fit their needs," he said. "Some centers may want all six, while some may only want three, but as long as they keep them intact we are in good shape. Then, my hope is that the smaller genomics groups adopt the classes as written to help the rest of the scientific community know what they are generating and submitting."

Chain added that the process of coming up with the proposed standards was not exactly an easy task since all major centers "have different pipelines, different sequencing techniques, different internal standards". They also recognized that the attempt to develop a "one size fits all" set of standards is still a work in progress. The definitions provided in the Science paper are fairly flexible, designed to apply regardless of the genome project or sequencing technologies employed and to meet each group's needs.

"We do expect that a number of people will comment on these standards, and possibly expand on the categories," he said, "but we feel we've covered all the bases with these six categories."

Chain said the group plans to team with the Genomic Standards Consortium, a grassroots movement begun by scientists who were concerned about the need for data collection standards in genome projects. The group has also talked to public archives such as GenBank to append these proposed standards to GenBank entries so that researchers can tell if the sequences will be useful to them. "Standards are a major issue to be tackled in genomics right now," Chain said. "These proposals are guideposts meant to inform users and generators."

Other DOE JGI authors on the study include David Bruce, Phil Hugenholtz, Nikos Kyrpides, Alla Lapidus, Sam Pitluck and Jeremy Schmutz. Other collaborating institutions are the Sanger Institute and the HMP Jumpstart Consortium sequencing centers (Washington University School of Medicine, the Broad Institute, the J. Craig Venter Institute, and Baylor College of Medicine), as well as Michigan State University, the Ontario Institute for Cancer Research, National Center for Biotechnology Information, Seattle Children's Hospital and Research Institute, Emory GRA and the Naval Medical Research Center.

The U.S. Department of Energy Joint Genome Institute, supported by DOE's Office of Science, is committed to advancing genomics in support of DOE missions related to clean energy generation and environmental characterization and cleanup. DOE JGI, headquartered in Walnut Creek, Calif., provides integrated high-throughput sequencing and computational analysis that enable systems-based scientific approaches to these challenges. Follow DOE JGI on Twitter.

David Gilbert | EurekAlert!
Further information:
http://www.lbl.gov

More articles from Life Sciences:

nachricht Scientists uncover the role of a protein in production & survival of myelin-forming cells
19.07.2018 | Advanced Science Research Center, GC/CUNY

nachricht NYSCF researchers develop novel bioengineering technique for personalized bone grafts
18.07.2018 | New York Stem Cell Foundation

All articles from Life Sciences >>>

The most recent press releases about innovation >>>

Die letzten 5 Focus-News des innovations-reports im Überblick:

Im Focus: Future electronic components to be printed like newspapers

A new manufacturing technique uses a process similar to newspaper printing to form smoother and more flexible metals for making ultrafast electronic devices.

The low-cost process, developed by Purdue University researchers, combines tools already used in industry for manufacturing metals on a large scale, but uses...

Im Focus: First evidence on the source of extragalactic particles

For the first time ever, scientists have determined the cosmic origin of highest-energy neutrinos. A research group led by IceCube scientist Elisa Resconi, spokesperson of the Collaborative Research Center SFB1258 at the Technical University of Munich (TUM), provides an important piece of evidence that the particles detected by the IceCube neutrino telescope at the South Pole originate from a galaxy four billion light-years away from Earth.

To rule out other origins with certainty, the team led by neutrino physicist Elisa Resconi from the Technical University of Munich and multi-wavelength...

Im Focus: Magnetic vortices: Two independent magnetic skyrmion phases discovered in a single material

For the first time a team of researchers have discovered two different phases of magnetic skyrmions in a single material. Physicists of the Technical Universities of Munich and Dresden and the University of Cologne can now better study and understand the properties of these magnetic structures, which are important for both basic research and applications.

Whirlpools are an everyday experience in a bath tub: When the water is drained a circular vortex is formed. Typically, such whirls are rather stable. Similar...

Im Focus: Breaking the bond: To take part or not?

Physicists working with Roland Wester at the University of Innsbruck have investigated if and how chemical reactions can be influenced by targeted vibrational excitation of the reactants. They were able to demonstrate that excitation with a laser beam does not affect the efficiency of a chemical exchange reaction and that the excited molecular group acts only as a spectator in the reaction.

A frequently used reaction in organic chemistry is nucleophilic substitution. It plays, for example, an important role in in the synthesis of new chemical...

Im Focus: New 2D Spectroscopy Methods

Optical spectroscopy allows investigating the energy structure and dynamic properties of complex quantum systems. Researchers from the University of Würzburg present two new approaches of coherent two-dimensional spectroscopy.

"Put an excitation into the system and observe how it evolves." According to physicist Professor Tobias Brixner, this is the credo of optical spectroscopy....

All Focus news of the innovation-report >>>

Anzeige

Anzeige

VideoLinks
Industry & Economy
Event News

Leading experts in Diabetes, Metabolism and Biomedical Engineering discuss Precision Medicine

13.07.2018 | Event News

Conference on Laser Polishing – LaP: Fine Tuning for Surfaces

12.07.2018 | Event News

11th European Wood-based Panel Symposium 2018: Meeting point for the wood-based materials industry

03.07.2018 | Event News

 
Latest News

A smart safe rechargeable zinc ion battery based on sol-gel transition electrolytes

20.07.2018 | Power and Electrical Engineering

Reversing cause and effect is no trouble for quantum computers

20.07.2018 | Information Technology

Princeton-UPenn research team finds physics treasure hidden in a wallpaper pattern

20.07.2018 | Materials Sciences

VideoLinks
Science & Research
Overview of more VideoLinks >>>