Specific protein-DNA and protein-protein interactions are fundamental to most cellular functions. Protein-DNA interactions, for example, form the basis of important mechanisms by which the cell activates or represses gene expression and regulates DNA replication. Polypeptide interactions are involved in, inter alia, formation of functional transcription complexes, repression of certain genes, signal transduction pathways, cytoskeletal organization (e.g., microtubule polymerization), polypeptide hormone receptor-ligand binding, organization of multi-subunit enzyme complexes, and the like.
Investigation of protein-DNA and protein-protein interactions under physiological conditions has been problematic. Considerable effort has been made to identify proteins that bind to proteins of interest. Typically, these interactions have been detected by using co-precipitation experiments in which an antibody to a known protein is mixed with a cell extract and used to precipitate the known protein and any proteins that are stably associated with it. This method has several disadvantages, such as: (1) it only detects proteins which are associated in cell extract conditions rather than under physiological, intracellular conditions, (2) it only detects proteins which bind to the known protein with sufficient strength and stability for efficient co-immunoprecipitation, (3) it may not be able to detect oligomers of the target, and (4) it fails to detect associated proteins which are displaced from the known protein upon antibody binding. Additionally, precipitation techniques at best provide a molecular weight as the main identifying characteristic. Similar difficulties exist in the analysis of physiologically relevant protein-DNA interactions. For these reasons and others, improved methods for identifying proteins that interact with a known protein have been developed.
One approach to these problems has been to use a so-called interaction trap system or “ITS” (also referred to as the “two-hybrid assay”) to identify polypeptide sequences which bind to a predetermined polypeptide sequence present in a fusion protein (Fields and Song (1989) Nature 340:245). This approach identifies protein-protein interactions in vivo through reconstitution of a eukaryotic transcriptional activator. The system has also been adapted for studying protein-DNA interactions.
The interaction trap systems of the prior art are based on the finding that most eukaryotic transcription activators are modular. Brent and Ptashne showed that the activation domain of yeast GALA, a yeast transcription factor, could be fused to the DNA binding domain of E. coli LexA to create a functional transcription activator in yeast (Brent et al. (1985) Cell 43:729-736). There is evidence that transcription can be activated through the use of two functional domains of a transcription factor: a domain that recognizes and binds to a specific site on the DNA and a domain that is necessary for activation. The transcriptional activation domain is thought to function by contacting other proteins involved in transcription. The DNA-binding domain appears to function to position the transcriptional activation domain on the target gene that is to be transcribed. These and similar experiments (Keegan et al. (1986) Science 231:699-704) formally define activation domains as portions of proteins that activate transcription when brought to DNA by DNA-binding domains. Moreover, it was discovered that the DNA binding domain does not have to be physically on the same polypeptide as the activation domain, so long as the two separate polypeptides interact with one another. (Ma et al. (1988) Cell 55:443-446).
Fields and his coworkers made the seminal suggestion that protein interactions could be detected if two potentially interacting proteins were expressed as chimeras. In their suggestion, they devised a method based on the properties of the yeast Gal4 protein, which consists of separable domains responsible for DNA-binding and transcriptional activation. Polynucleotides encoding two hybrid proteins, one consisting of the yeast Gal4 DNA-binding domain fused to a polypeptide sequence of a known protein and the other consisting of the Gal4 activation domain fused to a polypeptide sequence of a second protein, are constructed and introduced into a yeast host cell. Intermolecular binding between the two fusion proteins reconstitutes the Gal4 DNA-binding domain with the Gal4 activation domain, which leads to the transcriptional activation of a reporter gene (e.g., lacZ, HIS3) which is operably linked to a Gal4 binding site.
All yeast-based interaction trap systems in the art share common elements (Chien et al. (1991) PNAS 88:9578-82; Durfee et al. (1993) Genes & Development 7:555-69; Gyuris et al. (1993) Cell 75:791-803; and Vojtek et al. (1993) Cell 74:205-14). All use (1) a plasmid that directs the synthesis of a “bait”: a known protein which is brought to DNA by being fused to a DNA binding domain, (2) one or more reporter genes (“reporters”) with upstream binding sites for the bait fusion, and (3) a plasmid that directs the synthesis of proteins fused to activation domains and other useful moieties (“prey”). All current systems direct the synthesis of proteins that carry the activation domain at the amino terminus of the fusion, facilitating the expression of open reading frames encoded by, for example, cDNAs.
Due to an upper limit on the transformation efficiency of yeast cells of ˜106, the yeast-based one-hybrid and two-hybrid systems are not practical for use in the analysis of libraries larger than 107 in size. For the analysis of most cDNA libraries, the ability to cover libraries 106 to 107 in size is adequate. However, there are a number of situations in which the inability to search a library larger than 107 in size is problematic. One example is the challenge of searching libraries containing randomized sequences. For example, a strategy for randomizing at just six different residues in a test polypeptide can produce a library of variants which exceeds the practical use of the yeast interaction trap systems. To illustrate, if one employs a strategy using 24 different codons (encoding 19 different amino acids) at each of the six positions, the resulting library will have a potential DNA sequence space of 246 or ˜2×108 and an amino acid sequence space of 196 or ˜5×107. To ensure nearly complete coverage of such a library, one needs to oversample by a factor of at least three-fold (i.e.—one must sample 3×2×108 candidates). The difficulty with library size becomes exponentially more problematic with each additional residue that is randomized.
Another approach used to study protein-DNA and protein-protein interactions is the method of phage display. In this system, proteins are displayed on the surface of filamentous bacteriophage (e.g.—M13) that harbor the DNA encoding the displayed protein. Target proteins or DNA sequences of interest are immobilized on a solid support (typically plates or beads) and used to affinity-enrich libraries of phage-displayed proteins for candidates that bind to the target. Because these phage libraries are constructed in E. coli, this system can create libraries larger than 107 (and as large as 1011) in size. This method has been used successfully to identify and characterize both protein-DNA and protein-protein interactions. See, for example, Allen et al. (1995) Trends Biol. Sci. 20: 511-516; Phizicky et al. (1995) Microbiol. Rev. 59:94-123; Rebar et al. (1996) Mthds. Enzymol. 267:129-149; and Smith et al. (1997) Chem. Rev. 97:391-410. However, phage display does have certain significant limitations. Unlike direct, single-step selection methods (e.g.—the yeast one- and two-hybrid systems), phage display is an enrichment process that requires multiple cycles to obtain desired candidates from a library. In addition, phage display enrichments are performed in vitro (and not in vivo as in yeast one- and two-hybrid methods). Finally, because proteins must be exported to the bacterial cell membrane in order to be displayed on the phage surface, certain proteins (particularly larger ones) are not well suited for analysis by phage display. This last limitation can be particularly significant if this biological phenomenon artifactually removes certain candidates from a library.
More recently, a prokaryote-based interaction trap assay has been developed. See, for example, U.S. Pat. No. 5,925,523. The prokaryotic ITS derives in part from the unexpected finding that the natural interaction between a transcriptional activator and subunit(s) of an RNA polymerase complex can be replaced by a heterologous protein-protein interaction which is capable of activating transcription. Because bacteria (E. coli in particular) have a much higher relative transformation efficiency (typically 109 or greater) than yeast, the description of prokaryotic-based one- and two-hybrid systems would appear to address the library size restrictions of the yeast systems. However, although higher transformation efficiencies are possible in E. coli, a significant deficiency of the prior art is that it does not make clear which, if any, reporter gene(s) have the characteristics required for use in the analysis of libraries larger than 107 in size. Desirable reporter genes should have one or more of the following characteristics: 1) The reporter gene should readily facilitate the rapid analysis of very large numbers of candidates. Thus, reporter genes (e.g.—the lacZ gene encoding beta-galactosidase) that must be screened by a visual colony phenotype (e.g.—color) are not useful because no more than 103 to 104 colonies can be screened on a single agar plate and it is not practical to manually plate and assess 103 or more plates for each experiment. 2) The reporter gene system must be sufficiently stringent or selective so that spurious, randomly arising background mutations do not complicate the analysis. For example, a selection based on expression of the spectinomycin resistance gene (aadA) would not be suitable for the analysis of large libraries because randomly occurring mutations that result in spectinomycin resistance arise at a frequency of approximately 10−4 to 10−5 (Sera and Schultz, PNAS, 93: 2920-2925 (1996); Huang et al., PNAS, 91: 3969-3973 (1994)). Thus, if one were to examine a library of 108 members using the aadA system, one should expect to receive 103 or more false positives due solely to spontaneous spectinomycin resistance. This can pose a significant problem particularly if true positives occur with low frequency in the 108 member library. 3) Expression of the reporter gene should be quantifiable and should easily facilitate the selection of candidates based on any specific criteria. For example, an ideal reporter system would allow one to isolate library members that meet specific quantitative cutoffs (e.g. expression of reporter >50 or <50) and/or windows (e.g. expression of reporter >25 AND <75, or <25 OR >75).
There are at least two additional deficiencies in the prior art describing the prokaryotic ITS:
A) The ability to simultaneously monitor the expression of multiple reporter genes in a single cell.
U.S. Pat. Nos. 5,925,523 and 5,580,736 and others (PCT applications WO 99/14319; WO 99/28745; WO 99/31509 and WO 99/28744; and Grossle et al., Nature Biotechnology 17: 1232-1233 (1999) have noted the usefulness of having the interaction between the bait and prey constructs activate more than one reporter gene in a single cell to reduce the occurrence of false positives. Additionally, Grossle et al., Nature Biotechnology 17: 1232-1233 (1999) and Serebriiskii et al., J. Biol. Chem. 274: 17,080-17,087 (1999) demonstrate a “dual bait” version of the yeast two hybrid system capable of monitoring the interaction of two different bait proteins with a single prey protein. This system can be used to screen for cells which have a desired combination of interactions between a single prey protein and two bait proteins by utilizing a combination of growth selection screens and visual lacZ screens. However, in contrast to the present invention, those references do not teach or suggest simultaneous and independent monitoring of the expression of multiple reporter genes in a single cell where the expression of each reporter gene is regulated by the interaction of a single protein of interest with different partners. For example, one may wish to select a protein (from a large library) that interacts with Target Protein A but does NOT interact with Target Protein B. In this case, if the system was set up such that binding of the interactor protein with Target Protein A increased the expression of Reporter Gene A and the binding of the interactor protein with Target Protein A increased the expression of Reporter Gene B, we would want to select those cells that had very high expression of Reporter Gene A AND very low expression of Reporter Gene B. Selections of this type (based on the strengths of multiple interactions) would also be especially useful for selecting very specific DNA-binding proteins that bind well to the desired target site but do NOT bind well to even closely related sites. We note that U.S. Pat. No. 5,925,523 does not teach how one could easily monitor multiple reporters in a single cell and that, to our knowledge, no reference describes how to simultaneously monitor the differential expression of multiple reporters in a single cell.
B) Methods for practicing library vs library screening.
With the wealth of genomic information currently becoming available, a number of groups have begun to address the challenges in library vs. library screening of large collections of coding sequences. Ideally, a method for performing such a comprehensive library vs. library search should: 1) provide an efficient method for crossing two large libraries and 2) be amenable to partial or complete automation. The use of transformation as a method to effect the simultaneous (or sequential) introduction of two libraries into either yeast or bacterial cells fails to meet either of these criteria. Even in bacteria where very high transformation efficiencies are possible, examination of 109 combinations would only allow one to examine two libraries each comprised of only 33,000 candidates. In addition, since transformation requires pre-treatment of cells (e.g.—washing and resuspension in divalent cation solutions) and multiple protocol steps (e.g.—heat shock, addition of medium, recovery), it is not easily adaptable for automation. For library vs. library experiments conducted in yeast, investigators have exploited the fact that yeast can exist as one of two sexes (a and a) in haploid form. Mating of a and a cells leads to the formation of a diploid a/a cell harboring the DNA from both the starting haploid cells. Thus, a cells harboring a library of prey hybrids can be easily mated with a cells harboring a test bait hybrid(s) simply by mixing the cells together and selecting for diploid cells. In this way, a large number of combinations can be simply and rapidly tested, bypassing the need for labor-intensive transformation experiments when crossing the libraries. See Uetz et al. (2000) Nature 403:623-627 and Walhout et al. (2000) Science 287:116-122. Prokaryotes (and E. coli in particular) replicate asexually, and U.S. Pat. No. 5,925,523 and the existing literature do not teach how to perform analogous library mating experiments in the prokaryotic ITS.
It is an object of the present invention to describe the following improvements to the ITS: 1) reporter genes (and methods for detecting their expression) that readily permit the analysis of large libraries (>107 in size) and whose selectivity can be easily “tuned,” modified, and/or monitored, 2) methods for the simultaneous and independent measurement of multiple interactions (as judged by expression of different reporter genes), and 3) construction of libraries using a phagemid-based system that provides a) an efficient, automatable method for performing library vs. library experiments and b) a method to simplify the analysis of positive candidates from ANY screen/selection performed in the prokaryotic ITS.