The invention relates to high throughput methods for identifying the function of sample nucleic acids and their products. The invention is exemplified by the use of the E1-complementing adenoviral packaging cell line PER.C6 in combination with an E1-deleted plasmid-based generation system to produce recombinant adenoviral vectors in a high throughput setting to functionate the product of a sample nucleic acid.
The ultimate goal of the Human Genome Project is to sequence the entire human genome. The expected outcome of this effort is a precise map of the 70,000-100,000 genes that are expressed in man. However, a fairly complete inventory of human coding sequences will most likely be publicly available sooner. Since the early 1980s, a large number of Expressed Sequence Tags (ESTs), which are partial DNA sequences read from the ends of complementary DNA (cDNA) molecules, have been obtained by both government and private research organizations. A hallmark of these endeavors, carried out by a collaboration between Washington University Genome Sequencing Center and members of the IMAGE (Integrated Molecular Analysis of Gene Expression) consortium (http:/www-bio.llnl.gov/bbrp/image/image.html), has been the rapid deposition of the sequences into the public domain and the concomitant availability of the sequence-tagged cDNA clones from several distributors (Marra, et al. (1998) Trends Genet. 14 (1):4-7). At present, the collection of cDNAs is believed to represent approximately 50,000 different human genes expressed in a variety of tissues including liver, brain, spleen, B-cells, kidney, muscle, heart, alimentary tract, retina, and hypothalamus, and the number is growing daily.
Recent initiatives like that of the Cancer Genome Anatomy project support an effort to obtain full-length sequences of clones in the Unigene set (a set of cDNA clones that is publicly available) by the year 1999. At the same time, commercial entities propose to validate 40,000 full-length cDNA clones by 1999. These individual clones will then be available to any interested party. The speed by which the coding sequences of novel genes are identified is in sharp contrast to the rate by which the function of these genes is elucidated. Assigning functions to the cDNAs in the databases, or functional genomics, is a major challenge in biotechnology today.
For decades, novel genes were identified as a result of research designed to explain a biological process or hereditary disease and the function of the gene preceded its identification. In functional genomics, coding sequences of genes are first cloned and sequenced and the sequences are then used to find functions. Although other organisms such as Drosophila, C. elegans, and Zebrafish are highly useful for the analysis of fundamental genes, animal model systems are inevitable for complex mammalian physiological traits (blood glucose, cardiovascular disease, inflammation). However, the slow rate of reproduction and the high housing costs of the animal models are a major limitation to high throughput functional analysis of genes. Although labor intensive efforts are made to establish libraries of mouse strains with chemically or genetically mutated genes in a search for phenotypes that allow the elucidation of gene function or that are related to human diseases, a systematic analysis of the complete spectrum of mammalian genes, be it human or animal, is a significant task.
In order to keep pace with the volume of sequence data, the field of functional genomics needs the ability to perform high throughput analysis of true gene function. Recently, a number of techniques have been developed that are designed to link tissue and cell specific gene expression to gene function. These include cDNA microarraying and gene chip technology and differential display messenger RNA (mRNA). Serial Analysis of Gene Expression (SAGE) or differential display of mRNA can identify genes that are expressed in tumor tissue but are absent in the respective normal or healthy tissue. In this way, potential genes with regulatory functions can be separated from the excess of ubiquitously expressed genes that have a less likely chance to be useful for small drug screening or gene therapy projects. Gene chip technology has the potential to allow the monitoring of gene expression through the measurement of mRNA expression levels in cells of a large number of genes in only a few hours. Cells cultured under a variety of conditions can be analyzed for their mRNA expression patterns and compared. Currently, DNA microarray chips with 40,000 non-redundant human genes are produced and are planned to be on the market in 1999 (Editorial (1998) Nat. Genet. 18(3):195-7.). However, these techniques are primarily designed for screening cancer cells and not for screening for specific gene functions.
Double or triple hybrid systems also are used to add functional data to the genomic databases. These techniques assay for protein-protein, protein-RNA, or protein-DNA interactions in yeast or mammalian cells (Brent and Finley (1997) Annu. Rev. Genet. 31:663-704). However, this technology does not provide a means to assay for a large number of other gene functions such as differentiation, motility, signal transduction, and enzyme and transport activity. Yeast expression systems have been developed which are used to screen for naturally secreted and membrane proteins of mammalian origin (Klein, et al. (1996) Proc. Natl. Acad. Sci. USA 93 (14):7108-13). This system also allows for collapsing of large libraries into libraries with certain characteristics that aid in the identification of specific genes and gene products. One disadvantage of this system is that genes encoding secreted proteins are primarily selected. A second disadvantage is that the library may be biased because the technology is based on yeast as a heterologous expression system and there will be gene products that are not appropriately folded.
Other current strategies include the creation of transgenic mice or knockout mice. A successful example of gene discovery by such an approach is the identification of the osteoprotegerin gene. DNA databases were screened to select ESTs with features suggesting that the cognate genes encoded secreted proteins. The biological functions of the genes were assessed by placing the corresponding full-length cDNAs under the control of a liver-specific promoter. Transgenic mice created with each of these constructs consequently have high plasma levels of the relevant protein. Subsequently, the transgenic animals were subjected to a battery of qualitative and quantitative phenotypic investigations. One of the genes that was transfected into mice produced mice with an increased bone density, which led subsequently to the discovery of a potent anti-osteoporosis factor (Simonet, et al. (1997) Cell. 89(2):309-19). The disadvantages of this method are that the method is costly and highly time consuming.
The challenge in functional genomics is to develop and refine all the above-described techniques and integrate their results with existing data in a well-developed database that provides for the development of a picture of how gene function constitutes cellular metabolism and a means for this knowledge to be put to use in the development of novel medicinal products. The current technologies have limitations and do not necessarily result in true functional data. Therefore, there is a need for a method that allows for direct measurement of the function of a single gene from a collection of genes (gene pools or individual clones) in a high throughput setting in appropriate in vitro assay systems and animal models.
The development of high throughput screens is discussed in Jayawickreme and Kost, (1997) Curr. Opin. Biotechnol. 8:629-634. A high throughput screen for rarely transcribed differentially expressed genes is described in von Stein et al., (1997) Nucleic Acids Res. 35: 2598-2602. High throughput genotyping is disclosed in Hall et al., (1996) Genome Res. 6:781-790. Methods for screening transdominant intracellular effector peptides and RNA molecules are disclosed in Nolan, WO97/27212 and WO/9727213.
The invention includes methods, and compositions for use therein, for directly, rapidly, and unambiguously measuring the function of sample nucleic acids of unknown function in a high throughput setting, using a plasmid-based E1-deleted adenoviral vector system and an E1-complementing host cell. The method includes constructing a set of adapter plasmids by inserting a set of cDNAs, DNAs, ESTs, genes, synthetic oligonucleotides, or a library of nucleic acids into E1-deleted adapter plasmids; cotransfecting an E1-complementing cell line with the set or library of adapter plasmids and at least one plasmid having sequences homologous to sequences in the set of adapter plasmids and which also includes all adenoviral genes not provided by the complementing cell line or adapter plasmids necessary for replication and packaging to produce a set or library of recombinant adenoviral vectors preferably in a miniaturized, high throughput setting. To identify and assign a function to product(s) encoded by the sample nucleic acids, a host is transduced in a high throughput setting with the recombinant adenoviral vectors, which express the product(s) of the sample nucleic acids and thereby alter a phenotype of a host. The altered phenotype is identified and used as the basis to assign a function to the product(s) encoded by the sample nucleic acids. The plasmid-based system is used to rapidly produce adenoviral vector libraries that are preferably replications competent adenovirus (xe2x80x9cRCAxe2x80x9d)-free for high throughput screening. Each step of the method can be performed in a multiwell format and automated to further increase the capacity of the system. This high throughput system facilitates expression analysis of a large number of sample nucleic acids from human and other organisms both in vitro and in vivo and is a significant improvement over other available techniques in the field.