To study homologous sequences, researchers use computer programs, such as BLAST (Basic Local Alignment Search Tool), to compare a DNA or protein sequence with a collection of other sequences. One such collection is GenBank, the genetic sequence database operated by the NCBI that contains all publicly available DNA and protein sequences. Biologists use databases such as GenBank to find out if a test sequence matches any known sequences, how well it matches, and which portions of the sequence match.

Computer programs identify matching sequences by similarity. However, similar sequences are not always homologous, because they may not have a common origin. Although many sequences that show similarity did evolve from a common ancestor, the appearance of similar sequences can also result from independent events. For example, mutations frequently occur in the gene for the envelope protein of the AIDS virus, HIV-1, changing the amino acid sequence of the protein. The human immune system recognizes and destroys unmutated viruses, while leaving unharmed (selecting for) those viruses that contain mutations that make them unrecognizable. As a result, viruses from different patients can show identical mutations in the envelope protein, even though the patients were infected by different strains of the virus.

