The frequency of occurrence of the CpG dinucleotide in the genome is not random, as would be expected. Instead, the CpG dinucleotide is greatly under represented in eukaryotic genomes, occurring at approximately 5 to 10 percent of its predicted frequency, according to some estimates. Of these occurrences, it is further estimated that 70 to 80 percent are methylated. This under representation of CpG dinucleotides in the genome may result from a spontaneous conversion of methyl cytosine to thymine in DNA by a process known as deamination, in which an amino group (in this case, NH2) is removed from 5-methylcytosine. For this reason, methylated cytosines represent potential sites of spontaneous DNA mutation in the genome.
There are, however, small regions of DNA that are very rich in linked cytosines and guanines, but which are unmethylated. These regions, which can consist of from 500 to 5,000 base pairs of unmethylated DNA, are referred to as CpG islands. These "islands" commonly occur in promoter regions of genes (regions where RNA polymerase binds to start transcription), which are located at the 5′ ("five prime") end of the genes. In fact, about 50 percent of all genes contain a CpG island in their promoter regions. The lack of methylation in CpG islands leads to a less compact chromatin structure, and generally allows for active gene expression. The methylation of unmethylated CpG islands leads to the silencing of genes required for proper cell growth control and is a common mechanism in the development of many types of cancer.