The present invention relates to recombinant DNA technology and more specifically to the structure and function of DNA vectors for the transfer, replication and expression of heterologous DNA inserted within the DNA vector.
The structure and operating principles of DNA vectors, which are of primary importance in recombinant DNA technology, have been set forth in the prior art, for example, in Cohen et al., U.S. Pat. No. 4,237,224. A DNA vector is a self-replicating entity, usually of plasmid or phage origin, having certain unique restriction sites permitting the insertion of heterologous DNA at a locus which is non-essential for vector replication. Preferably, insertion results in the alteration of some non-essential but detectable function as a result of splitting the gene coding for that function on the vector. Insertion of heterologous DNA at the desired site results in loss of the detectable function, thereby enabling the investigator to distinguish vectors having inserts from those lacking inserts by the characteristics conferred upon cells containing such vectors. Drug resistance genes have been widely used for such selection purposes. Unmodified vectors may confer upon host cells that carry them resistance to an antibiotic, such as ampicillin. Insertion of heterologous DNA within the ampicillin-resistance gene results in loss of the ampicillin resistance of host cells. Similarly, insertion of heterologous DNA within the region coding for the enzyme beta-galactosidase has permitted selection for bacterial colonies which lack beta-galactosidase activity. Screening for colonies having an active beta-galactosidase (Lac.sup.+ phenotype) is carried on agar plates containing the substrate analog 5-bromo-4-chloro-3-indolyl-beta-D-galactoside (hereinafter XG). Colonies expressing a Lac.sup.+ phenotype are stained blue as a result of enzyme activity hydrolyzing the substrate analog and releasing a blue dye while colonies which lack enzyme activity (Lac.sup.- phenotype) remain white. Transformation of a Lac.sup.- host strain by a vector carrying an intact and expressible lacZ gene (coding for beta-galactosidase) is manifested by blue colonies transformed to Lac.sup.+ phenotype. The presence of a heterologous insert in the vector disrupting the beta-galactosidase coding sequence on a vector (in a Lac.sup.- host) is detected by "white selection", the appearance of white colonies against a background of predominately blue colonies. (In this system, the background of untransformed Lac.sup.- host cells is eliminated by incorporating a drug resistance marker in the vector, and adding the drug to the growth medium on the plates so that only vector-transformed cells grow at all). Additional constraints on vector structure are required if the inserted heterologous DNA is to be expressed. Expression is a general term denoting the synthesis of a protein, a peptide or an RNA transcript by a cell. One manifestation is the synthesis of messenger RNA using the heterologous DNA as template. The fundamental requirements for expression in the form of protein or polypeptide coded by the inserted DNA are that the insert be adjacent to a promoter-translation start region, that the orientation of the insert be the same as the promoter-translation start region, that the insert be preceded by a translation start codon, and, if the insert lies within an existing coding region, that the existing and inserted coding regions be in the same reading frame phase.
Terms used herein are intended to have the meaning generally understood in the art. Thus, a promoter-translation start region is a segment of DNA which is normally untranslated but which functions in the initiation of transcription (messenger RNA synthesis) and in the initiation of translation of mRNA, for example, by providing a ribosomal binding site. A promoter-translation start region has a defined direction of action with respect to the vector as a whole, such that coding regions lying to one side are under its functional influence, whereas adjacent coding regions on the other side are unaffected by it. Coding regions affected by the promoter are said to be in a direction "downstream" of the promoter. The coding region itself is also directional, that part which codes for the amino terminal end of the protein for which it codes must lie nearest the promoter-translation start region, downstream therefrom, while the part coding for the --COOH terminal region of the protein must lie farthest from the promoter-translation start region. When the promoter-translation start region and coding region are properly disposed with respect to one another such that translation can occur as just described, they are said to be in proper orientation. Improper orientation occurs if the coding region is inserted backwards, such that the --COOH coding end is closer to the promoter-translation start region, or if the coding region is inserted upstream from the promoter. Reading frame refers to the manner in which adjacent nucleotides are clustered in groups of three, each group of three, or triplet, coding for an amino acid of a protein. The reading frame is established by the first ATG triplet (coding for methionine) which lies a few nucleotides, typically 11 to 17, downstream from the ribosomal binding site sequence of the promoter-translation start region. The reading frame must be maintained without interruption throughout the coding region. If an inserted DNA is to be expressed by joining it to an existing coding region, it must be joined in such a manner that the desired reading frame of the insert is the same as that of the coding sequence to which it is joined. The two reading frames are then said to be in phase. If the reading frames are out of phase, the integrity of the sequence of nucleotide triplets is interrupted in going from one coding region to the next, by the interposing of one or two extra nucleotides. Translation then continues, using nucleotide triplets established by the reading frame of the pre-existing coding region until a stop codon is encountered. It has been observed that stop codons are frequently encountered in coding sequences which are read incorrectly in either of the two alternative reading frames. In fact, the existence of an open reading frame, i.e., a reading frame in which no stop codon is encountered over a length sufficient to code for a polypeptide protein or fragment thereof, is considered presumptive evidence that the segment is in fact a coding segment in that reading frame.
The synthesis of proteins by genetically altered microorganisms is a major aspect of the multi-million dollar recombinant DNA industry. Virtually any useful protein, such as an enzyme, hormone or antigen, can be synthesized by such organisms, usually in amounts far exceeding conventional isolations. In any case, however, it is necessary first to clone a gene or DNA segment coding for the desired protein. The cloning is frequently the most difficult and problematic step.
A basic problem of cloning technology relates to the identification of clones containing a specific desired sequence against a background of clones containing other sequences. Most of the initial successes in cloning were obtained in systems where it could reasonably be expected that the desired cDNA was present in high proportion. Pre-purification of mRNA to select for mRNA in a desired size range was frequently employed. Where homologous sequences had previously been cloned, it was possible to identify the desired clones by hybridization, using the known sequence as a probe. The brute force technique of sequence analysis has also been employed, although chiefly in situations where only a few clones needed to be screened. At the present time, a need exists for techniques of general utility that would enable investigators to identify a clone containing the desired sequence under conditions where it may be necessary to screen a large number of clones in order to obtain a positive result, e.g., where the desired sequence is present in low proportion.