4 minute read

Proteomics

Laboratory Techniques, Databases And Computational Approaches

Proteomics is the science of studying the multitude of proteomes found in living organisms. A proteome is the entire collection of proteins expressed by a genome or in a tissue. The contents of a proteome can differ in various tissue types, and it can change as a result of aging, disease, drug treatment, or environmental effects.

This is contrary to the concept of a genome, which is an organism's complete collection of DNA. A genome's composition remains more or less constant from tissue to tissue, except for mutations and polymorphisms that can occur.

The word "proteome" was first coined in late 1994. By 1997 there were a number of research conferences focusing on proteomics.

According to the first draft of the human genome, based on the work by the Human Genome Project and by Celera Inc., there are only between thirty thousand and seventy thousand genes in the human genome, many fewer than had been estimated previously. However, as of 2002 there were still groups that believed that there are at least 120,000 genes. Regardless of which of these estimates proves more accurate, the number of potential proteins in the human proteome is quite large. Although the first draft of the human genome reduced the estimates for the total number of human genes, it also predicted a greater amount of alternative splicing of genes, and therefore more distinct protein products per gene, than had been anticipated.

At its simplest level, proteomics is the study of protein expression in a proteome, or trying to understand the relative levels (amounts) of each protein within the mixture. Proteomics attempts to characterize proteins, compare variations in their expression levels in normal and disease states, study their interactions with other proteins, and identify their functional roles.

Unlike the traditional approach of studying individual proteins one at a time, proteomics uses an automated, high-throughput approach. High-throughput refers to the number of items (in this case, proteins) that can be analyzed or studied per unit of time. New technologies and substantial bioinformatics tools are required to compare entire proteomes. Expansion of the field of proteomics into the realm of "big science" (meaning many dollars invested by a large number of companies and universities) is several years behind the expansion of genomics. This is primarily because proteins are more difficult to work with in a laboratory setting than are nucleic acids such as DNA.

The development of protein analysis technologies is more difficult than the development of DNA analysis technologies for three reasons. First, the basic alphabet for encoding proteins consists of twenty amino acids, whereas there are only four different nucleotides, the alphabet of DNA. Second, the messenger RNA (mRNA) for some genes can be differentially spliced, meaning that multiple messages can be made from a single gene, resulting in multiple, distinct protein products. Finally, many proteins are modified once they have been synthesized. This is known as post-translational modification. There are a number of types of post-translational modifications, such as the addition of sugar, phosphate, sulfate, lipid, acetyl, or methyl groups. Mass spectrometry systems are used to help scientists analyze the various proteomes within an organism. Each of these modifications has the ability to change the functional activity of a protein.

The above issues have made the elucidation of reliable, high-throughput techniques for characterizing proteins, including their expression levels, on a proteome-wide level a major challenge. Hence, techniques for doing, for example, high-throughput DNA sequencing and gene expression studies have been developed and commercialized on a large scale sooner than similar protein analysis techniques. This is not to imply that all of the techniques involved in proteomics are new. Some, such as two-dimensional gel electrophoresis, have been around since the 1970s. However, the need to adapt these techniques to a large "proteome" scale brings with it a unique set of challenges.

For researchers involved in areas such as drug discovery, proteomics approaches will need to be used to obtain a greater understanding of disease mechanisms and drugs' mechanisms of action. Large-scale studies looking at gene expression via quantification of mRNA abundance are already possible and well commercialized. These technologies are very powerful, and the highest throughput approaches are capable of analyzing tens of thousands of genes per experiment. Sophisticated bioinformatics systems have been, and continue to be, developed to analyze these vast amounts of data. However, studies have shown that mRNA levels do not necessarily correlate well with protein levels.

Researchers must understand proteins and their roles, since proteins are the functional units within cells. As of 2002, the vast majority of drug targets were proteins. There are a handful of drugs, including some chemotherapeutic agents, that bind to DNA, but most drugs bind to specific protein targets. In the cases where the target is a protein, the drugs themselves are primarily small inorganic molecules or, in some cases, small proteins, such as hormones, that bind to a larger protein target in the body. Proeomics can help researchers understand how proteins interact in cells. Some drugs are actually therapeutic proteins that are delivered to the site of the disease.

Additional topics

Medicine EncyclopediaGenetics in Medicine - Part 3