Computer scientist locates more than 1,000 novel genes in mouse and human

’Best laid plans of mice and men’

Using both the mouse and human genomes, a computer scientist at Washington University in St. Louis and international collaborators have developed a method for predicting novel genes in both genomes. With the method the scientists have discovered 1,019 novel genes that are found in both man and mouse. The breakthrough is expected to speed up discovery of genes in both genomes as well as those of other mammals. Because it is efficient and cost-effective, laboratories are likely to use it and pursue genetic studies on a number of major fronts.

“Whereas it might have taken 7,000 experiments to verify a thousand genes, with our method it now will take only about 1,500,” said Michael R. Brent, Ph.D., associate professor of computer science at Washington University in St. Louis.

Brent developed TWINSCAN, one of the programs used to predict genes by looking at both the alignment between the two genomes and statistical patterns in the individual DNA sequences of each genome. DNA is comprised of four varieties of bases (commonly abbreviated as A, T, G, C). The myriad different arrangements of these base pairings — or sequences — are the instructions for making proteins, which in turn give physiological traits such as color, hair type, muscle variations, etc. DNA looks like a long string of unintelligible pairings, but programs such as Brent’s highlight the genes in the sequence, making sense of it for biomedical researchers.

Simply put, what Brent and his colleagues did was develop computer programs that use patterns of evolutionary conservation — DNA sequences that have not changed since the common ancestor of mouse and man — to improve the accuracy of gene prediction. They identified a set of 1,019 predicted novel mouse genes and showed that genes in this set can be verified experimentally with a very high success rate.

A paper describing the results was published in the Feb. 4, 2003, issue of the Proceedings of the National Academy of Science. Brent’s collaborators included researchers in Barcelona, Spain, Geneva, Switzerland, the United Kingdom and GlaxoSmithKline, in King of Prussia, Pa.

Among the genes the researchers believe they have found are a new relative of the dystrophin gene, which is mutated in Duchenne muscular dystrophy, a number of genes involved in neural development, and several immune system genes.

There are between 25,000 and 30,000 genes in both the human and mouse genomes, with no more than 500 genes separating the two mammals. “We know the locations of about 15,000 to 22,000 genes,” Brent said. ’There is a big chunk of genes that we know are missing, some of them multi-exon genes. (Exons are segments of the gene that contain the protein coding portion). We now have this very sensitive and specific method for finding, predicting and testing multi-exon genes in mammals, and we think that the method provides a very good tool for completing the catalog of multi-exon genes in humans.”

An unknown portion of the missing genome is comprised of single-exon genes, which present a different problem for gene prediction, partly because single-exon genes can be confused with a class of genes called processed pseudo genes. Beyond delineating the human and mouse genomes, Brent conjectured that the method of gene prediction would enhance analysis of genomes more closely related to the human genome, such as the monkey and other primates, as well as the chicken and rat genomes.

Brent received a bachelor’s degree in mathematics from MIT in 1985 and a Ph.D. in Computer Science in 1991. His doctoral research at the MIT Artificial Intelligence Lab focused on machine learning of human languages. From 1991 to 1999 he served as Assistant and then Associate Professor of Cognitive Science at Johns Hopkins University, where his research focused on mathematical models of how children learn their native languages. After moving to the Department of Computer Science and Engineering at Washington University in 1999, Brent began a new research program in computational biology focusing on mathematical models for predicting the locations and structures of genes in genome sequences. He currently holds a joint appointment in the Washington University School of Medicine Department of Genetics and devotes all of his effort to computational gene prediction and experimental gene verification.

Media Contact

Tony Fitzpatrick EurekAlert!

More Information:

http://www.wustl.edu/

All latest news from the category: Life Sciences and Chemistry

Articles and reports from the Life Sciences and chemistry area deal with applied and basic research into modern biology, chemistry and human medicine.

Valuable information can be found on a range of life sciences fields including bacteriology, biochemistry, bionics, bioinformatics, biophysics, biotechnology, genetics, geobotany, human biology, marine biology, microbiology, molecular biology, cellular biology, zoology, bioinorganic chemistry, microchemistry and environmental chemistry.

Back to home

Comments (0)

Write a comment

Newest articles

Lighting up the future

New multidisciplinary research from the University of St Andrews could lead to more efficient televisions, computer screens and lighting. Researchers at the Organic Semiconductor Centre in the School of Physics and…

Researchers crack sugarcane’s complex genetic code

Sweet success: Scientists created a highly accurate reference genome for one of the most important modern crops and found a rare example of how genes confer disease resistance in plants….

Evolution of the most powerful ocean current on Earth

The Antarctic Circumpolar Current plays an important part in global overturning circulation, the exchange of heat and CO2 between the ocean and atmosphere, and the stability of Antarctica’s ice sheets….

Partners & Sponsors