The sequence of the complete human genome is now available. To take full advantage of this genomic sequence data, it is necessary to use computational and/or experimental methods to distinguish sequences that have biological function from those that do not. For example, it is estimated that only 5% of the human genome contains coding regions. The value of identifying coding sequence is clear as variation in coding sequences can have a direct impact on the encoded protein and the functionality of the gene; thus, there is a tremendous effort in the genomics community to identify such coding sequences. However, in addition to coding sequences, there are non-coding sequences in the genome that have great importance in determining gene function. These important non-coding sequences contain regulatory regions, such as promoters, enhancers, ribosome binding sites, transcription termination sites and the like. Sifting through the 95% of the genome comprised of non-coding sequences to identify the small fraction of non-coding elements with biological importance is an even greater challenge than identifying genes. Therefore, methods to identify rapidly putative functional, non-coding sequences in the human genome or the genome of any organism are needed.
Conversely, it is of interest to understand and study how very closely related organisms differ from one another genetically. Such organism-differentiating sequences are what give a particular organism unique characteristics. For example, comparison of the genomes of two closely-related corn hybrids may allow one to identify the genetic sequence that makes one of the hybrids robust even in times of draught or resistant to a particular parasite.
Thus, it is of great interest in the field of genetics to determine the sequences of the genomes of many different organisms and identify functional regions and organism-differentiating sequences therein. One way to identify such sequences is by comparing the sequence of one organism to another. However, in methods known to date, in order to make such comparisons both sequences must be known. Though a great deal of sequencing has been done for many organisms in the past 10 years, the entire genomes of only a handful of organisms is known.