The most widely used technique for sequencing proteins is the Edman degradation, a procedure developed by Pehr Edman in the 1950s. The reaction steps used for this method have since been completely automated by machine. The procedure uses special reagents under alternating basic and acidic conditions to remove one amino acid at a time from the protein's N-terminus. As each amino acid is released during each cycle of degradation, it is identified by chromatography, a separation technique that relies on an amino acid's unique size and electrical charge to distinguish it from the other nineteen amino acids.
In many automated approaches, high-performance liquid chromatography (HPLC) is used to tell which amino acid has been released; the amount of time it takes to travel through an HPLC column is unique to each amino acid. Up to fifty amino acids from the N-terminus can be identified using Edman degradation. If a scientist is trying to identify a previously sequenced protein, usually only the first fifteen to twenty amino acids of the purified protein need to be sequenced. That information can then be entered into a database and matched with known proteins having identical or related sequences.
Sequencing a protein from its C-terminus is particularly challenging, and there are no techniques that are as robust as Edman degradation. However, some limited amino acid sequence information can be obtained using enzymes called carboxypeptidases, which remove individual C-terminal amino acids. These enzymes, however, tend to cleave only specific amino acids from the C-terminus.
Carboxypeptidase B, isolated from cow pancreas, for example, can release the amino acids arginine and lysine from the C-terminus of a protein. Carboxypeptidase A, also isolated from cow pancreas, fails to release arginine, lysine, or proline, but can cleave off the other seventeen amino acids. Carboxypeptidases isolated from citrus leaves and yeast can cleave off any amino acid from the C-terminus of a protein, although the rate at which they do this depends on the particular amino acid. If one amino acid is released slowly and the next within the chain is released very quickly, they might appear to be cleaved at the same time, making it difficult to establish their order. C-terminal amino acid identification using enzymes, therefore, is not practical beyond the first several positions.
Another method of protein sequencing, called mass spectrometry, uses electric current to break individual amino acids from a protein. In a mass spectrometer, the released amino acids are collected in a detector and are each identified by their unique mass.
Sequencing of the human genome has allowed a giant leap in the understanding of how the human species evolved and how genetic diseases arise. Advances made in DNA sequencing technology lead to this grand accomplishment. The next frontier is to decipher how all the proteins encoded by the genome interact to carry out the processes of life. This is the study of proteomics. Advances in mass spectrometry and protein sequencing instrumentation are bringing this challenging problem closer to its resolution.
Frank H. Stephenson
and Maria Cristina Abilock
Creighton, Thomas E. Proteins: Structures and Molecular Properties. New York: W. H.Freeman, 1993.