The invention herein is exemplified by the cloning of a deoxynucleotide sequence possessing regulatory signals which are recognized preferentially by human cell factors involved in transcription by RNA polymerase II. The subject DNA sequence, or enhancer, provides a promoter-regulatory region to enhance the expression of an adjacent gene in eucaryotic cells of human origin. Enhancers have been found in various animal viruses such as polyoma virus, BK virus, adenovirus, simian virus 40, Moloney sarcoma virus, bovine papilloma virus, and Rous sarcoma virus (deVillier and Schaffner, Nucl. Acids Res. 9:6251-6264, 1981; Rosenthal et al., Science 222:749-755, 1983; Hearing and Shenk, Cell, 33:695-703, 1983; Weeks and Jones, Mol. Cell, Biol. 3:1222-1234, 1983; Levinson et al., Nature 295:568-572, 1982; Luciw et al., Cell 33:705-716, 1983. Enhancer elements have also been found to occur in immunoglobulin genes (Banerji et al., Cell 33:729-749, 1983; Gillies et al., Cell 33:717-728, 1983; Queen and Baltimore, Cell 33:741-748, 1983).
Gene activation can be mediated in a host-specific and/or tissue-specific manner (deVilliers and Schaffner, Nucl. Acids Res. 9:6251-6254, 1981. Queen and Baltimore, Cell 33:741-748, 1983). It is assumed that host cell factors involved in transcription interact with the DNA sequence containing the enhancing element by an unknown mechanism. By definition, an enhancer can influence downstream gene expression while present in either orientation and at various distances up to 4 kilobases from the enhanced gene (Chambon and Breathnack, Annual Rev. Biochem. 50:349-383, 1981).
Human cytomegalovirus (HCMV), a member of the herpes virus classification group, has a large double-stranded DNA genome of 240 kilobases (kb). The viral genome consists of a long and short unique region flanked by different repeat sequences that are inverted relative to each other. Four genome arrangements, resulting from the possible combination of inversions of the two sections of the genome, are present in DNA preparations in approximately equal amounts.
At immediate early (IE) times after infection--i.e., in the absence of de novo viral protein synthesis, 88% or more of the viral RNA originates from a region in the long unique component of the viral genome between 0.660 and 0.751 map units for the Towne strain of HCMV. One or more of the IE viral genes presumably codes for a viral regulatory protein that stimulates transcription from other regions of the viral genome.
Based on the high steady-state levels of viral mRNA and the abundance of its translation product in the infected cell, the IE gene between 0.739 and 0.751 map units is highly expressed and has been designated IE region 1 or the major IE gene. Adjacent IE genes from 0.732 to 0.739 (region 2) map units and from 0.709 to 0.728 (region 3) map units are expressed at relatively low levels and, consequently, are considered minor IE genes. Transcription under IE conditions is also detectable from another adjacent region of approximately 0.660-0.685 map units.
Developments in recombinant DNA technology have made it possible to isolate specific genes or portions thereof from higher organisms, such as man and other mammals, and to transfer the genes or fragments to a microorganism species, such as bacteria or yeast. The transferred gene is replicated and propagated as the transformed microorganism replicates. As a result, the transformed microorganism may become endowed with the capacity 1) to make whatever protein the gene or fragment encodes, whether it be an enzyme, a hormone, an antigen or an antibody, or a portion thereof or 2) to enhance or otherwise regulate such protein synthesis. The microorganism passes on this capability to its progeny, so that in effect, the transfer has resulted in a new strain, having the described capability See, for example, Ullrich, A., et al., Science 196, 1313 (1977), and Seeburg, P. H., et al, Nature 280, 486 (1977).
A basic fact underlying the application of this technology for practical purposes is that DNA of all living organisms, from microbes to man, is chemically similar, being composed of the same four nucleotides. The significant differences lie in the sequences of the nucleotides in the polymeric DNA molecule. The nucleotide sequences are used to specify the amino acid sequences of proteins that comprise the organism. Although most of the proteins of different organisms differ from each other, the coding relationship between nucleotide sequence and amino acid sequence is fundamentally the same for all organisms. For example, the same nucleotide sequence which codes for the amino acid sequence of insulin in human pancreas cells, will, when transferred to a microorganism, be recognized as coding for the same amino acid sequence.
Abbreviations used herein are given in Table 1.
TABLE 1 ______________________________________ DNA = deoxyribonucleic acid RNA = ribonucleic acid mRNA = messenger RNA A = Adenine T = Thymine G = Guanine C = Cytosine ______________________________________
Many recombinant DNA techniques employ two classes of compounds, transfer vectors and restriction enzymes, to be discussed in turn. A transfer vector is a DNA molecule which contains, inter alia, genetic information which insures its own replication when transferred to a host microorganism strain. Examples of transfer vectors commonly used in bacterial genetics are plasmids and the DNA of certain bacteriophages. Although plasmids have been used as the transfer vectors for the work described herein, it will be understood that other types of transfer vector may be employed.
Plasmid is the term applied to any autonomously replicating DNA unit which might be found in a microbial cell, other than the genome of the host cell itself. A plasmid is not genetically linked to the chromosome of the host cell. Plasmid DNA's exist as double stranded ring structures generally on the order of a few million daltons molecular weight, although some are greater than 10.sup.8 daltons in molecular weight. They usually represent only a small percent of the total DNA of the cell.
Transfer vector DNA is usually separable from host cell DNA by virtue of the great difference in size between them. Transfer vectors carry genetic information enabling them to replicate within the host cell, in some cases independently of the rate of host cell division. Some plasmids have the property that their replication rate can be controlled by the investigator by variations in the growth conditions.
Plasmid DNA exists as a closed ring. However, by appropriate techniques, the ring may be opened, a fragment of heterologous DNA inserted, and the ring reclosed, forming an enlarged molecule comprising the inserted DNA segment. Bacteriophage DNA may carry a segment of heterologous DNA inserted in place of certain nonessential phage genes. Either way, the transfer vector serves as a carrier or vector for an inserted fragment of heterologous DNA.
Transfer is accomplished by a process known as transformation. During transformation, bacterial cells mixed with plasmid DNA incorporate entire plasmid molecules into the cells. It is possible to maximize the proportion of bacterial cells capable of taking up plasmid DNA and hence of being transformed, by certain empirically determined treatments. Once a cell has incorporated a plasmid, the latter is replicated within the cell and the plasmid replicas are distributed to the daughter cells when the cell divides. Any genetic information contained in the nucleotide sequence of the plasmid DNA can, in principle, be expressed in the host cell.
Typically, a transformed host cell is recognized by its acquisition of traits carried on the plasmid, such as resistance to certain antibiotics. Different plasmids are recognizable by the different capabilities or combination of capabilities which they confer upon the host cell containing them. Any given plasmid may be made in quantity by growing a pure culture of cells containing the plasmid and isolating the plasmid DNA therefrom.
Restrictions endonucleases are hydrolytic enzymes capable of catalyzing site-specific cleavage of DNA molecules. The locus of restriction endonuclease action is determined by the existence of a specific nucleotide sequence. Such a sequence is termed the recognition site for the restriction endonuclease. Restriction endonucleases from a variety of sources have been isolated and characterized in terms of the nucleotide sequence of their recognition sites. Some restriction endonucleases hydrolyze the phosphodiester bonds on both strands at the same point, producing blunt ends. Others catalyze hydrolysis of bonds separated by a few nucleotides from each other, producing free single stranded regions at each end of the cleaved molecule. Such single stranded ends are self-complementary, hence cohesive or "sticky", and may be used to rejoin the hydrolyzed DNA.
Since any DNA susceptible of cleavage by such an enzyme must contain the same recognition site, the same cohesive ends will be produced, so that it is possible to join heterologous sequences of DNA which have been treated with restriction endonuclease to other sequences similarly treated. While restriction sites are relatively rare, the general utility of restriction endonucleases has been greatly amplified by the chemical synthesis of double stranded oligonucleotides bearing the restriction site sequence. Therefore, virtually any segment of DNA can be coupled to any other segment simply by attaching the appropriate restriction oligonucleotide to the ends of the molecule, and subjecting the product to the hydrolytic action of the appropriate restriction endonuclease, thereby producing the requisite cohesive ends.
The term "expression" is used in recognition of the fact that an organism seldom if ever makes use of all its genetically endowed capabilities at any given time. Even in relatively simple organisms such as bacteria, many proteins which the cell is capable of synthesizing are not synthesized, although they may be synthesized under appropriate environmental conditions. When the protein product, coded by a given gene, is synthesized by the organism, the gene is said to be expressed. If the protein product is not made, the gene is not expressed. Normally, the expression of bacterial genes is regulated in such a manner that proteins whose function is not useful in a given environment are not synthesized and metabolic energy is conserved.
The means by which gene expression is controlled in E. Coli are well understood, as the result of extensive studies over the past twenty years. See, generally, Hayes, W., The Genetics of Bacteria And Their Viruses, 2d edition, John Wiley and Sons, Inc., New York (1968), and Watson J. D., The Molecular Biology of the Gene, 3d edition, Benjamin, Menlo Park, Calif. (1976). These studies have revealed that several genes, usually those coding for proteins carrying out related functions in the cell, are found clustered together in continuous sequence. The cluster is called, an operon. All genes in the operon are transcribed in the same direction, beginning with the codons coding for the N-terminal amino acid of the first protein in the sequence and continuing through to the C-terminal end of the last protein in the operon. At the beginning of the operon, proximal to the N-terminal amino acid codon, there exists a region of the DNA, termed the control region, which includes a variety of controlling elements including the operator, promoter and sequences for the ribosomal binding sites as well as enhancers. The function of these sites is to permit the expression of those genes under their control to be responsive to the needs of the organism. For example, those genes coding for enzymes required exclusively for utilization of lactose are not expressed unless lactose or an analog thereof is actually present in the medium. The control region functions that must be present for expression to occur are the initiation of transcription and the initiation of translation at the position coding for the N-terminal amino acid of the first protein from that point is also initiated in turn, at least until a termination signal or another operon is encountered with its own control region, keyed to respond to a different set of environmental cues.
Once a given gene has been isolated, purified and inserted in a transfer vector, the over-all result of which is termed the cloning of the gene, its availability in substantial quantity is assured. The cloned gene is transferred to a suitable microorganism, wherein the gene replicates as the microorganism proliferates and from which the gene may be reisolated by conventional means. Thus is provided a continuously renewable source of the gene for further manipulations, modifications and transfers to other vectors or other loci within the same vector.