Gene finding remains an important problem in biology as scientists are still far from fully mapping the set of human genes. Furthermore, gene maps for other vertebrates, including important model organisms such as mouse, are much more incomplete than the human annotation. The new technique, known as CONTRAST (CONditionally TRAined Search for Transcripts), works by comparing a genome of interest to the genomes of several related species.
CONTRAST exploits the fact that the functional role protein-coding genes play a specific part within a cell and are therefore subjected to characteristic evolutionary pressures. For example, mutations that alter an important part of a protein's structure are likely to be deleterious and thus selected against. On the other hand, mutations that preserve a protein's amino acid sequence are normally well tolerated. Thus, protein-coding genes can be identified by searching a genome for regions that show evidence such patterns of selection. However, learning to recognize such patterns when more than two species are compared has proved difficult.
Previous systems for gene prediction were able to effectively make use of one additional 'informant' genome. For example, when searching for human genes, taking into account information from the mouse genome led to a substantial increase in accuracy. But, no system was able to leverage additional informant genomes to improve upon state-of-the-art performance using mouse alone, although it was expected that adding informants would make patterns of selection clearer. CONTRAST solves this problem by learning to recognize the signature of protein-coding gene selection in a fundamentally different way from previous approaches. Instead of constructing a model of sequence evolution, CONTRAST directly 'learns' which features of a genomic alignment are most useful for recognizing genes. This approach leads to overall higher levels of accuracy and is able to extract useful information from several informant sequences.
In a test on the human genome, CONTRAST exactly predicted the full structure of 59% of the genes in the test set, compared with the previous best result of 36%. Its exact exon sensitivity of 93%, compared with a previous best of 84%, translates into many thousands of exons correctly predicted by CONTRAST but missed by previous methods. Importantly, CONTRAST's accuracy using a combination of eleven informant genomes was significantly higher than its accuracy using any single informant. The substantial advance in predictive accuracy represented by CONTRAST will further efforts to complete protein-coding gene maps for human and other organisms.
Further information about existing gene-prediction methods and the advance CONTRAST brings to the field can be found in a minireview by Paul Flicek, which accompanies the article by Batzoglou and colleagues.
Two Group A Streptococcus genes linked to 'flesh-eating' bacterial infections
25.09.2017 | University of Maryland
Rainbow colors reveal cell history: Uncovering β-cell heterogeneity
22.09.2017 | DFG-Forschungszentrum für Regenerative Therapien TU Dresden
At the productronica trade fair in Munich this November, the Fraunhofer Institute for Laser Technology ILT will be presenting Laser-Based Tape-Automated Bonding, LaserTAB for short. The experts from Aachen will be demonstrating how new battery cells and power electronics can be micro-welded more efficiently and precisely than ever before thanks to new optics and robot support.
Fraunhofer ILT from Aachen relies on a clever combination of robotics and a laser scanner with new optics as well as process monitoring, which it has developed...
Plants and algae use the enzyme Rubisco to fix carbon dioxide, removing it from the atmosphere and converting it into biomass. Algae have figured out a way to increase the efficiency of carbon fixation. They gather most of their Rubisco into a ball-shaped microcompartment called the pyrenoid, which they flood with a high local concentration of carbon dioxide. A team of scientists at Princeton University, the Carnegie Institution for Science, Stanford University and the Max Plank Institute of Biochemistry have unravelled the mysteries of how the pyrenoid is assembled. These insights can help to engineer crops that remove more carbon dioxide from the atmosphere while producing more food.
A warming planet
Our brains house extremely complex neuronal circuits, whose detailed structures are still largely unknown. This is especially true for the so-called cerebral cortex of mammals, where among other things vision, thoughts or spatial orientation are being computed. Here the rules by which nerve cells are connected to each other are only partly understood. A team of scientists around Moritz Helmstaedter at the Frankfiurt Max Planck Institute for Brain Research and Helene Schmidt (Humboldt University in Berlin) have now discovered a surprisingly precise nerve cell connectivity pattern in the part of the cerebral cortex that is responsible for orienting the individual animal or human in space.
The researchers report online in Nature (Schmidt et al., 2017. Axonal synapse sorting in medial entorhinal cortex, DOI: 10.1038/nature24005) that synapses in...
Whispering gallery mode (WGM) resonators are used to make tiny micro-lasers, sensors, switches, routers and other devices. These tiny structures rely on a...
Using ultrafast flashes of laser and x-ray radiation, scientists at the Max Planck Institute of Quantum Optics (Garching, Germany) took snapshots of the briefest electron motion inside a solid material to date. The electron motion lasted only 750 billionths of the billionth of a second before it fainted, setting a new record of human capability to capture ultrafast processes inside solids!
When x-rays shine onto solid materials or large molecules, an electron is pushed away from its original place near the nucleus of the atom, leaving a hole...
19.09.2017 | Event News
12.09.2017 | Event News
06.09.2017 | Event News
25.09.2017 | Power and Electrical Engineering
25.09.2017 | Health and Medicine
25.09.2017 | Physics and Astronomy