Computers spot shape clues
Computer power is unravelling complex proteins.
Two techniques may help deduce proteins’ functions.
Imagine trying to guess what machines do just be looking at them. Even a can-opener would pose problems, if you didn’t know about cans. This is the challenge that faces molecular biologists as they try to make sense of protein molecules in the cell.
Two new techniques may help. One deduces a protein’s function from its shape; the other deduces its shape from a list of component parts1,2.
Having read most of the human genome, researchers can, in principle, deduce a protein’s sequence - the chain of amino-acid building blocks of which it is made. In a functioning protein, this chain folds up in a particular three-dimensional way.
The first step in understanding a protein’s job is therefore to work out its shape. Predicting protein folding is, on the face of it, an enormous challenge. Most proteins contain dozens or hundreds of amino acids, so there is an astronomical number of ways in which these might be arranged into a compact, folded structure.
Fortunately, only a tiny fraction of these folds - perhaps a thousand - are found in natural proteins. The challenge is to deduce the best fit of a particular protein sequence to one of these folds. This is called the protein-threading problem.
Traditionally, the problem is tackled by assuming that each amino acid prefers to be surrounded by others of a specific kind, and then to look for the best compromise between the needs of all the amino acids. Success using this approach depends on how well we know what the amino acids prefer.
Instead of trying to deduce this from physical and chemical principles, Jayanth Banavar of Pennsylvania State University and colleagues use a set of known protein structures to train a computer program to recognize the preferences of each amino acid. Once trained, the program, a neural network, can then predict unknown structures.
This learning-based method is more successful than one based on a priori assumptions about amino-acid preferences, the researchers show. The network correctly predicted the structures of 190 out of 213 test proteins; the conventional approach got only 137 structures right.
The next stage of the problem, going from structure to function, is what Mary Jo Ondrechen of Northeastern University in Boston, Massachusetts, and colleagues have looked at2. Most proteins are enzymes - they facilitate a chemical reaction. The priority of function hunters is to find the region where this transformation takes place, called the active site.
Many amino-acid groups in proteins can act as acids or bases - they can accept or release hydrogen ions. Usually this take-up or release is fairly abrupt as the acidity (pH) of the protein solution is altered - the amino acid switches from having the ion attached to being free of it over a narrow pH range.
Ondrechen’s team have found that amino acids at active sites don’t act in this simple way. Here, the behaviour of one unit affects that of the others.
A computer program that uses the known structure to predict how each amino acid in the protein sheds or acquires hydrogen ions when the pH is changed can spot this different behaviour of amino acids at active sites.
Anomalous behaviour, say the researchers, doesn’t necessarily indicate that an amino acid lies at the active site. But several such units close together are almost certainly indicators of the active site. The team says that their method could be automated to identify active sites rapidly - hopefully transforming a suite of protein structures into a list of their functions.
- Chang, I., Cieplak, M., Dima, R. I., Maritan, A. Banavar, J. R. Protein threading by learning. Proceedings of the National Academy of Sciences USA, in the press (2001).
- Ondrechen, M. J., Clifton, J. G. & Ringe, D. THEMATICS : a simple computational predictor of enzyme function from structure. Proceedings of the National Academy of Sciences USA, 98, 12473 - 2478, (2001).
PHILIP BALL | Nature News Service