The tremendous efforts at sequencing the genomes of human beings and other organisms has produced a vast amount of nucleic acid and protein sequence information for additional analysis. Much of the sequence information is now, or will be, the subject of both biochemical and functional characterization. The sequence information also serves as the raw material for “bioinformatics”, where the sequence itself is used in comparisons with other sequences for which the structure, function, or other characteristics have been previously identified. The great hope and expectation for these efforts is that with the identification of functionalities encoded by genetic sequences, additional therapeutic products and treatments can be developed for diseases in humans and other organisms.
The effort to identify functions encoded by genetic sequences has focussed, at least initially, on sequences that encode actual gene products, or “genes”. Earlier approaches sought to clone and sequence only genes based on tools and strategies for using positional cloning to map and clone genes. While labor intensive, positional cloning has been successful in locating genes associated with various diseases. Initially, genetic mapping is performed based on large families of related individuals to locate a disease associate gene at the level of chromosomal location and in the range of centimorgans. Next, and with a significant increase in effort, the work becomes one of physically mapping the genes so that centimorgans are reduced to megabasepairs and then finally to particular nucleotides. Examples of successes with positional cloning include the identification of genes associated with cystic fibrosis and Huntington's disease.
Other approaches to the isolation of genes include exon trapping (Buckler et al. (1991) P.N.A.S. 88:4005-4009) and direct selection (Morgan et al. (1992) N.A.R. 20:5173-5179). These methods identify potential genes in large genomic regions which are then sequenced and used in confirming the genes as actually expressed. In some cases, cells that normally express the potential gene are unknown, and it remains necessary to confirm the expression of the genes and identify the functionality of the encoded product.
An initial advantage available with positional cloning over the above two methods is that there is no need for knowledge concerning the functional or physiological role of the gene product of the identified gene. The identification is made based on following a phenotypic trait followed by studying genetic segregation of a particular sequence with the trait. But after identification, there may still be difficulties in determining the functional role of the gene product for the design of appropriate therapies. Without knowing the functional role of the encoded product, it remains difficult, for example, to identify suitable agents to use as pharmaceuticals to appropriately target the gene product. Additionally, it remains unknown how the identified gene is involved in the progression from onset and progression to the later stages of the disease.
A more recent approach to the isolation of genes has been based on massive sequencing efforts designed to identify all expressed sequences in a genome. Completion of such efforts in the human and Drosophila genomes, as well as some microorganisms, have been recently reported. But with the production of such large amounts of sequence information, the need for a rapid and efficient means for identifying the functionality of encoded gene products increases further. This need has led to intensive commercial and industrial activity for additional methods to identify gene function.
One means for identifying function is through bioinformatics, which seeks to determine functionality based on similarities between a new sequence and other sequences for which the structure, function, or other characteristics have been previously identified. Bioinformatics is most often performed with computer programs and thus have been termed to occur “in silico”. One drawback of bioinformatics, however, is that it only provides a starting point for possibly validating a postulated functionality of a gene sequence. Until a new sequence is actually expressed and characterized within a living cell or organism, the supposed functionality remains a hypothesis to be proven.
An approach to validate an assigned gene function is via the use of small animal models. For example, transgenic mice have been used for the overexpression of gene sequences in attempts to identify the encoded functionality. Gene sequences have also been used in the production of “knockout” mice where the endogenous mouse sequence is no longer expressed. But the time and cost of transgenic approaches have limited their usefulness to studies of only a few sequences at a time.
Another approach has been to make use of cell cultures to overexpress a gene sequence of interest. Unfortunately, there is no rapid and efficient means for reliably producing a “knockout” cell where the endogenous cellular sequence is not expressed or overexpressed. Overexpression methods are, however, limited by the vector system used to deliver and express the gene. As an initial matter, known vector systems limit the number of cells that are transfected with the gene. For example, plasmid vectors have low transfection efficiencies and thus require the use of a selectable marker to isolate transfected cells. But the expression of a marker gene from the plasmid vector tends to skew the phenotype detected because the gene of interest is not the only gene being overexpressed in the cell. Stated differently, expression of the gene of interest is not the only initial perturbation occurring in the cell. As such, the determination of gene function may be significantly mistaken due to skewing by expression of the marker gene. The same selectable marker mediated skewing is seen with some viral vectors, such as onco-retroviral vectors.
Higher transfection efficiencies are available from other viral vectors, such as adenovirus based vectors, but these vectors often fail to provide stable expression of the gene of interest. More importantly, such vectors often have large numbers of their own genes to express or suffer the risk of contamination due to co-infection by helper virus. The expression of vector and/or helper virus genes again perturbs the intracellular environment and skews the detected phenotype and thus affects the determination of gene function.
An additional limitation on the use of vector based overexpression is found with the uncertainty as to what resultant phenotype should be, or can be, detected in the transfected cell. Moreover, such methods rarely use primary cells but instead use cell lines or diseased cells where any identified gene function remains suspect because of the abnormal cellular environment.
Citation of the above documents is not intended as an admission that any of the foregoing is pertinent prior art. All statements as to the date or representation as to the contents of these documents is based on the information available to the applicant and does not constitute any admission as to the correctness of the dates or contents of these documents.