DNA Microarrays - The Role Of Bioinformatics

One of the tremendous difficulties in performing any kind of expression analysis is the manipulation of very large amounts of biological data, a field of study called bioinformatics. The usefulness of gene expression data depends on how much information is available for each identified gene. In other words, the identities of the genes associated with each spot on a microarray must be accessible as the analysis is done.

Descriptions and classifications of each gene on the array must be readily available, as no researcher can remember such details about the tens of thousands of genes that may be involved in the analysis. An analysis might be done many times, with slight changes in the parameters of the clustering algorithm each time. The genes that cluster together are examined at the end of each analysis, to look for reproducible patterns. This analysis must be done with the full understanding of the biology of the system being studied. Clusters of genes are most informative if they group in a biologically reasonable way. For this reason, microarray expression analysis is frequently exploratory. The results of the analysis are used to suggest additional, corroborative experiments.

Another bioinformatics challenge in gene expression studies is collecting information about the samples under analysis and storing the information in databases. If gene expression patterns of one hundred different tumor samples are being examined, it may be necessary to restrict the analysis to subgroups of the tumors in order to observe patterns in the data. This subgrouping or stratification of the samples is best performed on the basis of independently determined properties of those samples. For example, samples from only metastatic cancer cells could be grouped together for analysis and compared with those from nonmetastatic cancer cells, or the age of the patient at the onset of disease could be used to segregate the samples into different groups. Such subgroup analysis can only be done if complete information is collected and stored for all samples.

