Computer scientist locates more than 1,000 novel genes in mouse and human
Best laid plans of mice and men
Using both the mouse and human genomes, a computer scientist at Washington University in St. Louis and international collaborators have developed a method for predicting novel genes in both genomes. With the method the scientists have discovered 1,019 novel genes that are found in both man and mouse. The breakthrough is expected to speed up discovery of genes in both genomes as well as those of other mammals. Because it is efficient and cost-effective, laboratories are likely to use it and pursue genetic studies on a number of major fronts.
"Whereas it might have taken 7,000 experiments to verify a thousand genes, with our method it now will take only about 1,500," said Michael R. Brent, Ph.D., associate professor of computer science at Washington University in St. Louis.
Brent developed TWINSCAN, one of the programs used to predict genes by looking at both the alignment between the two genomes and statistical patterns in the individual DNA sequences of each genome. DNA is comprised of four varieties of bases (commonly abbreviated as A, T, G, C). The myriad different arrangements of these base pairings -- or sequences -- are the instructions for making proteins, which in turn give physiological traits such as color, hair type, muscle variations, etc. DNA looks like a long string of unintelligible pairings, but programs such as Brents highlight the genes in the sequence, making sense of it for biomedical researchers.
Simply put, what Brent and his colleagues did was develop computer programs that use patterns of evolutionary conservation -- DNA sequences that have not changed since the common ancestor of mouse and man -- to improve the accuracy of gene prediction. They identified a set of 1,019 predicted novel mouse genes and showed that genes in this set can be verified experimentally with a very high success rate.
A paper describing the results was published in the Feb. 4, 2003, issue of the Proceedings of the National Academy of Science. Brents collaborators included researchers in Barcelona, Spain, Geneva, Switzerland, the United Kingdom and GlaxoSmithKline, in King of Prussia, Pa.
Among the genes the researchers believe they have found are a new relative of the dystrophin gene, which is mutated in Duchenne muscular dystrophy, a number of genes involved in neural development, and several immune system genes.
There are between 25,000 and 30,000 genes in both the human and mouse genomes, with no more than 500 genes separating the two mammals. "We know the locations of about 15,000 to 22,000 genes," Brent said. There is a big chunk of genes that we know are missing, some of them multi-exon genes. (Exons are segments of the gene that contain the protein coding portion). We now have this very sensitive and specific method for finding, predicting and testing multi-exon genes in mammals, and we think that the method provides a very good tool for completing the catalog of multi-exon genes in humans."
An unknown portion of the missing genome is comprised of single-exon genes, which present a different problem for gene prediction, partly because single-exon genes can be confused with a class of genes called processed pseudo genes. Beyond delineating the human and mouse genomes, Brent conjectured that the method of gene prediction would enhance analysis of genomes more closely related to the human genome, such as the monkey and other primates, as well as the chicken and rat genomes.
Brent received a bachelors degree in mathematics from MIT in 1985 and a Ph.D. in Computer Science in 1991. His doctoral research at the MIT Artificial Intelligence Lab focused on machine learning of human languages. From 1991 to 1999 he served as Assistant and then Associate Professor of Cognitive Science at Johns Hopkins University, where his research focused on mathematical models of how children learn their native languages. After moving to the Department of Computer Science and Engineering at Washington University in 1999, Brent began a new research program in computational biology focusing on mathematical models for predicting the locations and structures of genes in genome sequences. He currently holds a joint appointment in the Washington University School of Medicine Department of Genetics and devotes all of his effort to computational gene prediction and experimental gene verification.
Tony Fitzpatrick | EurekAlert!