Gene finding remains an important problem in biology as scientists are still far from fully mapping the set of human genes. Furthermore, gene maps for other vertebrates, including important model organisms such as mouse, are much more incomplete than the human annotation. The new technique, known as CONTRAST (CONditionally TRAined Search for Transcripts), works by comparing a genome of interest to the genomes of several related species.
CONTRAST exploits the fact that the functional role protein-coding genes play a specific part within a cell and are therefore subjected to characteristic evolutionary pressures. For example, mutations that alter an important part of a protein's structure are likely to be deleterious and thus selected against. On the other hand, mutations that preserve a protein's amino acid sequence are normally well tolerated. Thus, protein-coding genes can be identified by searching a genome for regions that show evidence such patterns of selection. However, learning to recognize such patterns when more than two species are compared has proved difficult.
Previous systems for gene prediction were able to effectively make use of one additional 'informant' genome. For example, when searching for human genes, taking into account information from the mouse genome led to a substantial increase in accuracy. But, no system was able to leverage additional informant genomes to improve upon state-of-the-art performance using mouse alone, although it was expected that adding informants would make patterns of selection clearer. CONTRAST solves this problem by learning to recognize the signature of protein-coding gene selection in a fundamentally different way from previous approaches. Instead of constructing a model of sequence evolution, CONTRAST directly 'learns' which features of a genomic alignment are most useful for recognizing genes. This approach leads to overall higher levels of accuracy and is able to extract useful information from several informant sequences.
In a test on the human genome, CONTRAST exactly predicted the full structure of 59% of the genes in the test set, compared with the previous best result of 36%. Its exact exon sensitivity of 93%, compared with a previous best of 84%, translates into many thousands of exons correctly predicted by CONTRAST but missed by previous methods. Importantly, CONTRAST's accuracy using a combination of eleven informant genomes was significantly higher than its accuracy using any single informant. The substantial advance in predictive accuracy represented by CONTRAST will further efforts to complete protein-coding gene maps for human and other organisms.
Further information about existing gene-prediction methods and the advance CONTRAST brings to the field can be found in a minireview by Paul Flicek, which accompanies the article by Batzoglou and colleagues.
A one-way street for salt
21.09.2018 | Julius-Maximilians-Universität Würzburg
Nerve cells in the human brain can “count”
21.09.2018 | Rheinische Friedrich-Wilhelms-Universität Bonn
The building blocks of matter in our universe were formed in the first 10 microseconds of its existence, according to the currently accepted scientific picture. After the Big Bang about 13.7 billion years ago, matter consisted mainly of quarks and gluons, two types of elementary particles whose interactions are governed by quantum chromodynamics (QCD), the theory of strong interaction. In the early universe, these particles moved (nearly) freely in a quark-gluon plasma.
This is a joint press release of University Muenster and Heidelberg as well as the GSI Helmholtzzentrum für Schwerionenforschung in Darmstadt.
Then, in a phase transition, they combined and formed hadrons, among them the building blocks of atomic nuclei, protons and neutrons. In the current issue of...
Thin-film solar cells made of crystalline silicon are inexpensive and achieve efficiencies of a good 14 percent. However, they could do even better if their shiny surfaces reflected less light. A team led by Prof. Christiane Becker from the Helmholtz-Zentrum Berlin (HZB) has now patented a sophisticated new solution to this problem.
"It is not enough simply to bring more light into the cell," says Christiane Becker. Such surface structures can even ultimately reduce the efficiency by...
A study in the journal Bulletin of Marine Science describes a new, blood-red species of octocoral found in Panama. The species in the genus Thesea was discovered in the threatened low-light reef environment on Hannibal Bank, 60 kilometers off mainland Pacific Panama, by researchers at the Smithsonian Tropical Research Institute in Panama (STRI) and the Centro de Investigación en Ciencias del Mar y Limnología (CIMAR) at the University of Costa Rica.
Scientists established the new species, Thesea dalioi, by comparing its physical traits, such as branch thickness and the bright red colony color, with the...
Scientists have succeeded in observing the first long-distance transfer of information in a magnetic group of materials known as antiferromagnets.
An international team of researchers has mapped Nemo's genome, providing the research community with an invaluable resource to decode the response of fish to...
03.09.2018 | Event News
27.08.2018 | Event News
17.08.2018 | Event News
21.09.2018 | Trade Fair News
21.09.2018 | Earth Sciences
21.09.2018 | Health and Medicine