The sequencing of the human and other genomes has ushered in the age of functional genomics. A huge amount of resources are being devoted to analysis of these genomes. The drug discovery process is currently undergoing a fundamental revolution as it embraces functional genomics, that is, high throughput genome- or gene-based biology. This approach is rapidly superseding earlier approaches based on positional cloning. A phenotype (e.g., a biological function or genetic disease) is identified and then tracked back to the responsible gene, based on its genetic map position.
Functional genomics relies heavily on the various tools of bioinformatics to identify gene sequences of potential interest from the many molecular biology databases now available. There is a continuing need to identify and characterize further genes and their related polypeptides/proteins, as targets for drug discovery.
Most of the current methods used to mine data from sequence information rely on computer algorithms. These algorithms are designed to identify a variety of features in DNA sequences, including open reading frames of putative gene coding sequences. Once an open reading frame is identified, algorithms are utilized to provide putative protein sequences based on the presence of start and stop codons. These algorithms also attempt to define splice signals and thus excise exons from the sequences.
However, these algorithms, no matter how powerful, cannot replicate the actual expression and processing of genes. The algorithms may fail to identify all expressed genes. In particular, splice variants and genes expressed from alternative start sites may be missed the algorithms. Accordingly, what is needed in the art are biologically based methods of screening large amounts of genomic sequence data in a high-throughput fashion.