The industrial applications of genetic engineering are becoming evident in the production of pharmaceuticals, of foods having improved properties, and of chemical products (including enzymes) to facilitate manufacturing processes. The process of genetic engineering may begin by cloning a gene of interest which encodes a protein with the desired properties for the particular industrial application. Typically, cloning a gene is done by either breaking up a genome into manageable sized fragments, or generating cDNA fragments from isolated mRNA, and then cloning those genomic or cDNA fragments into a vector and introducing the resultant recombinant vectors into a competent host cell. Commonly used methods for screening transformants, to identify a transformant that contains a recombinant vector with a nucleic acid molecule inserted therein, include marker inactivation systems, including marker inactivation systems which utilize various indicator on reporter genes including lacZ or lacZ.alpha., galK, the gene for chloramphenicol acetyltransferase, the gene for the green fluorescent protein (GFP) and mutant forms thereof (see Cubitt et al, 1995, Trends in Biochem. 20:448-455), the gene for luciferase and the like; and positive selection systems which utilize lethal genes including ccdB (Bernard et al., 1994, Gene 148:71-74), the gene for mouse transcription factor GATA-1 (Trudel et al., 1996, BioTechniques 20:684-693), the gene for thymidine kinase, the gene for .beta.-lactamase and the like.
The lac operon marker inactivation system, is employed in one of the most widely used color selection systems for plasmids and single-stranded DNA (ssDNA) vectors (see, e.g., Messing et al., 1977, Proc. Natl. Acad. Sci. USA 74:3642-3646; Messing et al., 1981, Nucl. Acids Res. 9:309-321; Messing, 1983, Methods Enzymol. 101:20-78; and Yanisch-Perron et al., 1985, Gene 33:103-119). Essentially, the lac operon marker inactivation system functions by intracistronic complementation between the .alpha.-peptide encoded by the lacZ.alpha. gene fragment, and a .beta.-galactosidase molecule that most commonly carries a deletion of amino acids 12 through 42.
lacZ.alpha. is a gene fragment, comprising the proximal portion of the Escherichia coli lacZ gene, which encodes approximately 60 of the amino terminal amino acids of the .beta.-galactosidase polypeptide chain. The encoded product, the ".alpha.-peptide", complements the defective activity of the gene product of lacZM15, an allele that carries a spontaneous deletion of the codon for amino acids 12 through 42 of .beta.-galactosidase. Thus, to identify a transformant that contains a recombinant vector with a nucleic acid molecule inserted therein, vector having a cloning site in the lacZ.alpha. gene fragment is introduced into a host cell expressing a .beta.-galactosidase having a deletion of amino acids 12 through 42. Transformants, presumably containing vector carrying an intact lacZ.alpha. gene fragment, produce blue colonies or plaques when applied onto media containing a chromogenic .beta.-galactosidase substrate. This is because functional .beta.-galactosidase activity is achieved by complementation between the a-peptide and a .beta.-galactosidase molecule carrying the deletion, thereby cleaving a chromogenic substrate such as 5-bromo-4-chloro-3-indolyl-.beta.-D-galactoside ("X-gal") to produce deep blue dibromodichloroindigo. In contrast, transformants containing vector carrying a lacZ.alpha. gene fragment having an insertion produce colorless (white) colonies or plaques when similarly plated. Colorless colonies result when the inserted nucleic acid molecule interrupts expression of the lacZ.alpha. gene fragment so that the complementing .alpha.-peptide is not produced.
Currently, all lacZ.alpha.-based vectors (e.g. Messing et al., 1977, supra; Yanisch-Perron et al., 1985, supra; Guan et al., 1987, Gene 67:21-30; Short et al., 1988, Nucl. Acids Res. 16:7583-7600; Alting-Mees and Short, 1989, Nucl. Acids Res. 17:9494; Evans et al., 1995, Biotechniques 19:130-135; and U.S. Pat. No. 4,766,072) employ the same mechanism for color selection. This mechanism involves placement of restriction sites for insertion of a nucleic acid molecule upstream of the codon for amino acid 7 of .beta.-galactosidase, wherein the inserted nucleic acid molecule ("insert") results in interference with the expression, but not the activity, of the lacZ .alpha.-peptide. The current marker inactivation configuration has the disadvantage in that problems arise in the detection of recombinant molecules. More specifically, false positives (white colonies or plaques containing vector not having an insert) and false negatives (colored colonies or colored plaques containing vector that have an insert) may be generated (see, e.g., Messing, 1983, supra; unpublished observations; and Table 2 herein).
Although false positive results are difficult to eliminate owing to the fact that they arise to a large extent out of factors which are extraneous to the selection system itself, these do not generally constitute a problem since they are selected alongside actual positives and are subjected to further scrutiny before their fate is decided. Among the external factors responsible for generating false positives are (i) contamination of restriction and modification enzymes with exonucleases, polymerases or other restriction enzymes; (ii) spontaneous mutations; and (iii) loss of the F' episome carrying the lacZM15 allele.
False negatives, on the other hand, represent a problem as they are rarely carried forward for further examination and, as a result, are responsible for numerous erroneous conclusions. Such erroneous conclusions include, at least in part, the general phenomenon referred to as "non-clonable sequences", and the presence of an excessive number of gaps in shotgun DNA sequencing results. False negatives are caused by both extrinsic factors, as well as factors which are intrinsic to the architecture of the color selection mechanism itself. In the currently available lacZ.alpha.-based vectors, there are two principal causes of false negatives: (i) in-frame insertion of DNA fragments containing one or more open reading frames; and (ii) reinitiation of translation within the mRNA transcribed from the inserted DNA fragment at any in-frame AUG, GUG or even UUG and CUG preceded by a pseudo Shine-Delgarno box. Events arising out of either of these two instances result in the synthesis of .alpha.-peptides bearing aminoterminal fusions. Since neither amino nor carboxyterminal fusions to the .alpha.-peptide usually impair its activity (see, e.g., Slilaty et al., 1990, Eur. J. Biochem. 194:103-108), blue colonies or blue plaques indistinguishable from those colonies or plaques produced by vectors not carrying an insert are formed. The number of false negatives produced in like manner is further augmented by the fact that even the less frequent fusions, having diminished levels of .alpha.-peptide activity, produce blue colonies or blue plaques due to the hypersensitivity of the X-gal assay system. The hypersensitivity of the X-gal system represents the fact that very little .beta.-galactosidase activity is needed for a complete color-producing reaction to take place.
Hypersensitivity of the X-gal assay system is also responsible for another source of false negatives. This source of false negatives arises as a result of .beta.-galactosidase-like activity produced by the ebg locus of the host cell. The ebg (evolved .beta.-galactosidase) operon is located directly across the chromosome from lacZ and codes for an enzyme that has low level .beta.-galactosidase-like activity (Hall et al., 1989, Genetics 123:635-648). In wild-type strains, this enzyme does not have enough activity to allow growth on lactose. However, in typical screening protocols, host cells suspected of being transformants are grown in the presence of an inducer of lacZ.alpha. gene expression. In such circumstances, the enzyme typically having a low level .beta.-galactosidase-like activity has enough activity in the presence of such inducers (e.g., isopropyl thiogalactoside or "IPTG") to cleave the chromogenic substrate X-gal, thus yielding bluish colonies, or more frequently white colonies with blue centers (unpublished observations). The effects of the ebg locus on blue color formation, in colonies that otherwise would be white, may be minimized by avoiding long incubation periods of plated cells (less than 18 hours), or completely eliminated by using hosts carrying a defective ebg locus.
Thus, there is a need for a cloning vector utilizing the lacZ.alpha. marker inactivation system, wherein the cloning vector is based on a configuration which minimizes the generation of false negatives. Such a novel cloning vector allows for improved accuracy and reliability in detecting the inactivation of the lacZ.alpha. gene fragment caused by insertion of a nucleic acid molecule. The novel cloning vector may be used for general cloning purposes, as well as for gap-free shotgun sequencing, in facilitating industrial applications of gene isolation, genetic engineering and development of ordered genomic libraries.