The human genome contains approximately 3.times.10.sup.9 nucleotides, but only about 10,000-40,000 genes expressed at one time in any given cell type. Massive research efforts have been directed to identifying and isolating genes and their related regulatory DNA. These efforts have been confounded by the limitations of existing technology.
Transcription of eukaryotic genes is regulated by cis-acting DNA sequences. Promoters are located immediately upstream of transcriptional start sites and control transcriptional initiation by RNA polymerase. Enhancer elements increase the rate of transcriptional initiation, and to a certain extent, function irrespective of their position or orientation.
Most promoters have been isolated from genomic libraries using cDNA probes to identify sequences downstream of the transcriptional start site and by testing nearby sequences for promoter activity. However, isolating cellular promoters can be difficult because nearly full-length cDNA clones may be required to identify genomic sequences near the sites of transcriptional initiation, and transcribed genomic sequences may be hard to distinguish from untranscribed pseudogenes.
Several investigators have used moveable elements to isolate cellular promoters or enhancers by linking random DNA fragments to the coding sequence of a selectable marker (eg. transforming or antibiotic resistance genes), introducing the DNA into recipient cells and selecting for cell clones that result if the gene is expressed. However, this approach suffers from several limitations that the present strategy avoids. First, DNA-mediated gene transfer is not the most efficient form of transduction, particularly in certain cell types. Second, introduced genes are frequently amplified in cells surviving selection. This increases background and necessitates screening multiple clones or performing secondary transfections in order to identify clones containing only one gene copy. Third, potential promoter/enhancer elements identified following DNA transfer are not expressed in their normal chromosomal locations.
Transfected enhancerless genes have been used to identify transcriptionally active chromosome regions. In some cases, expression appeared to be regulated in a tissue specific manner. Similarly, integration specific activation of proviral genes has been observed in cells in which the LTR is transcriptionally inactive. However, cloning transcriptional activators using this approach is difficult, because elements such as enhancers may be located at considerable distance and on either side of the integrants.
The present invention exploits the ability of retroviruses to move genes into random sites of mammalian genomes.
Retroviruses are RNA viruses that replicate through a DNA intermediate. Flanking the ends of the viral RNA genome are short sequence repeats (R) and unique sequences (U5 and U3) that control DNA synthesis, integration, transcription, and RNA processing. Between the control regions are coding sequences for the major structural proteins of the virus particle (gag and env) and for enzymes found in particles (pol, protease, reverse transcriptase and integrase) (FIG. 1).
Shortly after infection, viral RNA is converted into DNA by reverse transcriptase. Prior to integration, terminal sequences of the viral genome are duplicated such that the retroviral genome is flanked by long terminal repeats (LTRs), each containing the U3, R and U5 regions. Then integration occurs.
The exact mechanism of integration is unclear. There is evidence that formation of circular molecules with 2 tandem LTRs creates cis-acting recognition sequences for the enzyme(s) catalyzing integration. However, several investigators have shown that linear viral DNA can integrate directly without forming a circularized intermediate, at least in vitro.
LTR sequences are maintained in the integrated retrovirus, also termed -provirus-, except that two nucleotides (nt) are lost from each end. Cellular DNA sequences also are unaltered except that upon integration, 4-6 nt are duplicated such that the provirus is flanked at each end by 4-6 bp repeats. As a provirus, the retroviral genome is replicated with cellular DNA and transcribed as a cellular gene. Provirus transcription is controlled by promoter/enhancer sequences located in the U3 region of the 5' LTR. Polyadenylated transcripts initiate at the junction between U3 and R (cap site) in the 5' LTR and terminate in R of the 3' LTR that contains the signal for polyadenylation. RNA is synthesized by cellular RNA polymerase II and processed by the cellular enzymes. Full-length (genomic) RNA is transported from the nucleus to the cytoplasm and either packaged into virus particles that bud from the cell or are translated to yield gag and pol proteins. A fraction of the RNA is spliced to yield mRNA encoding env.
It is possible to adapt retroviruses to transduce genes into mammalian genomes. Provided that certain control sequences within the LTRs remain unaltered [Murphy, 1989 #76; Dougherty, 1987 #134], the retroviral genome can be deleted without impairing its ability to replicate in cells that express proteins necessary for reverse transcription, integration and particle formation. For this, vector DNA is transfected into cell lines that contain complete retroviral genomes or helper viruses. The helper viruses are constructed so that they cannot assemble into particles, due to a small deletion encompassing a sequence (.psi.) between U5 and gag. Since the vector DNA does not contain the .psi. deletion, recombinant transcripts are packaged and expelled from the cells as virus particles. In addition to .psi., gag sequences also enhance the ability of the vectors to be packaged.
Retroviruses appear to integrate randomly throughout the genome although about one fifth of all integrations have been reported to involve highly preferred sites. Integrations sometimes results in mutations that either inactivate or augment expression of genes in the vicinity of the provirus. Gene inactivation may be caused by insertions into exons that interrupt open reading frames or introns that alter normal splicing patterns. Activation of genes adjacent to the provirus involves transcriptional enhancement either by upstream U3 promoters or nearby U3 enhancers.
Retroviruses have been used both as probes for transcriptionally active chromosomal regions and as insertional mutagens. However, several factors have undermined the practical use of retroviruses as genetic tools to study mammalian organisms. First, large genomes (3.times.10.sup.9 nucleotides) necessitate screening large numbers of integrants in order to detect mutations in any specific gene. Second, mutations resulting from provirus integration are generally recessive, since most mammalian genomes are diploid. Third, enhancers in the LTRs may influence the expression of adjacent genes, and thus interfere with detecting cellular sequences that regulate transcription in a tissue specific manner. Finally, 3' RNA processing signals and AUG codons within the lefthand LTR interfere with activation of proviral genes by nearby cellular promoters. As a consequence, retroviruses have been used only to a limited extent, for example: (i) as enhancer traps, by using cell lines in which the vital enhancer is inactive or by using viruses in which the viral enhancer has been deleted, or (ii) as gene traps which rely on RNA splicing to remove intervening viral sequences.