1. INTRODUCTION . . . 1
2. BACKGROUND OF THE INVENTION . . . 1
3. SUMMARY OF THE INVENTION . . . 7
4. DESCRIPTION OF THE FIGURES . . . 10
5. DETAILED DESCRIPTION OF THE INVENTION . . . 14
5.1. DETECTING INTERACTING PROTEINS . . . 15
5.2. CHARACTERIZATION OF INTERACTIVE POPULATIONS THAT ARE DIFFERENTIALLY EXPRESSED BY A PARTICULAR TISSUE TYPE, DISEASE STATE OR STAGE OF DEVELOPMENT, AND CREATION OF xe2x80x9cPROTEIN INTERACTION MAPSxe2x80x9d . . . 39
5.2.1. DETERMINATION OF ALL THE DETECTABLE PROTEIN-PROTEIN INTERACTIONS . . . 40
5.2.2. CLASSIFICATION OF THE ARRAYED POOLS OF INTERACTANTS BY THE QEA(trademark) METHOD AND THE SEQ-QEA(trademark) METHOD . . . 42
5.2.3. ARRAYING AND CODING STRATEGIES FOR AN INTERACTIVE POPULATION . . . 43
5.2.4. MAINTAINING LINKAGE BETWEEN PAIRS OF INTERACTING PROTEINS . . . 44
5.2.5. POOLING STRATEGIES . . . 47
5.2.6. ALTERNATIVE STRATEGIES TO CHARACTERIZE INTERACTIVE POPULATIONS . . . 50
5.2.6.1. SEQUENCE-BASED STRATEGIES TO IDENTIFY PAIRS OF INTERACTING PROTEINS . . . 50
5.2.6.2. CREATION OF INTERACTIVE-GRIDS . . . 51
5.2.7. STATISTICAL CONSIDERATIONS FOR DETECTING ALL POSSIBLE INTERACTIONS AMONG GENES THAT ARE EXPRESSED AT DIFFERENT LEVELS . . . 52
5.2.8. ALTERNATIVE PREFERRED EMBODIMENTS . . . 54
5.2.9. INFORMATION PROCESSING ASPECTS OF DETECTING PROTEIN-PROTEIN INTERACTIONS . . . 69
5.2.9.1. IDENTIFICATION DATABASE AND PROCESSING . . . 74
5.2.9.2. INTERACTION DATABASE . . . 83
5.2.9.3. INTERACTION DATABASE FUNCTIONS . . . 92
5.3. INTEGRATED ISOLATION OF INHIBITORS OF AN INTERACTIVE POPULATION . . . 98
5.4. THE QEA(trademark) METHOD . . . 109
5.4.1. QUANTITATIVE EXPRESSION ANALYSIS METHOD, GENERALLY . . . 109
5.4.2. DETAILS OF A QUANTITATIVE EXPRESSION ANALYSIS METHOD . . . 124
5.4.3. RE EMBODIMENTS OF A QEA(trademark) METHOD . . . 133
5.4.3.1. FIRST ALTERNATIVE RE EMBODIMENT . . . 154
5.4.3.2. SECOND ALTERNATIVE RE EMBODIMENT . . . 156
5.4.4. A SEQ-QEA(trademark) EMBODIMENT OF A QEA(trademark) METHOD . . . 159
5.4.5. QEA(trademark) ANALYSIS AND DESIGN METHODS . . . 167
5.4.5.1. QEA(trademark) EXPERIMENTAL ANALYSIS METHODS . . . 168
5.4.5.2. QEA(trademark) EXPERIMENTAL DESIGN METHODS . . . 182
5.4.5.3. THE QEA(trademark) METHOD AMBIGUITY RESOLUTION . . . 189
5.4.6. APPARATUS FOR PERFORMING THE QEA(trademark) METHODS . . . 192
6. EXAMPLES . . . 198
6.1. DESCRIPTION OF PROTOCOLS . . . 199
6.1.1. MATING PROTOCOL . . . 199
6.1.2. TRANSFORMATION PROTOCOL . . . 204
6.1.3. RNA EXTRACTION . . . 204
6.1.4. DNASE TREATMENT . . . 205
6.1.5. MESSENGER RNA PURIFICATION . . . 205
6.1.6. cDNA SYNTHESIS AND CONSTRUCTION OF FUSION-LIBRARIES . . . 205
6.1.7. TRANSFORMATION OF THE REPORTER STRAINS WITH THE BINDING DOMAIN FUSION cDNA LIBRARY AND ACTIVATION DOMAIN cDNA LIBRARY TO CREATE xe2x80x9cMxe2x80x9d AND xe2x80x9cNxe2x80x9d POPULATIONS . . . 207
6.1.8. WHOLE CELL PCR . . . 210
6.1.9. RECOVERY OF COLONIES POSITIVE FOR PROTEIN-PROTEIN INTERACTION . . . 216
6.1.10. PRODUCTION OF PCR POOLS FOR CREATION OF PROTEIN INTERACTION MAPS . . . 217
6.1.11. xcex2-GALACTOSIDASE ASSAYS . . . 218
6.1.12. PROTOCOLS FOR QEA(trademark) METHODS AND SEQ-QEA(trademark) METHODS . . . 219
6.1.12.1. PREFERRED QEA(trademark) RE METHOD . . . 219
xe2x80x836.1.12.1.1. cDNA PREPARATION . . . 219
xe2x80x836.1.12.1.2. PREFERRED RE/LIGASE AND AMPLIFICATION REACTIONS . . . 220
xe2x80x836.1.12.1.3. PREFERRED AUTOMATED RE/LIGASE REACTIONS . . . 222
xe2x80x836.1.12.1.4. ALTERNATIVE RE/LIGASE AND AMPLIFICATION REACTIONS . . . 225
xe2x80x836.1.12.1.5. OPTIONAL POST-AMPLIFICATION STEPS 228
6.1.12.2. PREFERRED METHODS OF A SEQ-QEA(trademark) EMBODIMENT . . . 229
xe2x80x836.1.12.2.1. QEA(trademark) METHOD PREFERRED FOR USE IN A SEQ-QEA(trademark) METHOD . . . 229
xe2x80x836.1.12.2.2. SEQ-QEA(trademark) METHOD STEPS . . . 232
6.1.12.3. PREFERRED QEA(trademark) METHOD ADAPTERS AND RE PAIRS . . . 236
6.1.12.4. FLUORESCENT LABELS FOR QEA(trademark) METHODS . . . 243
6.1.12.5. PREFERRED REACTANTS FOR SEQ-QEA(trademark) METHODS . . . 245
6.1.13. POST MATING VERIFICATION PROTOCOLS . . . 247
6.1.13.1. PLASMID DROP-OUT PROTOCOL . . . 247
6.1.13.2. YEAST MATRIX MATING PROTOCOL . . . 249
6.2. LIBRARIES . . . 252
6.3. CONSTRUCTION OF YEAST STRAINS . . . 253
6.3.1. CONSTRUCTION OF STRAINS N105 AND N106 . . . 254
6.3.2. CONSTRUCTION OF THE REPORTER STRAIN N106xe2x80x2 . . . 255
6.3.3. CONSTRUCTION OF THE REPORTER STRAIN N105xe2x80x2 . . . . . . . . 255
6.3.4. CONSTRUCTION OF THE REPORTER STRAIN YULH . . . 256
6.3.5. CONSTRUCTION OF THE YEAST STRAIN N203 . . . 257
6.4. CONSTRUCTION OF FUSION GENES . . . 261
6.5. CONSTRUCTION OF cDNA LIBRARIES IN pASSfiI (GDB) . . . 263
6.6. TRANSFORMATION OF THE REPORTER STRAINS WITH THE pASSfiI AND pACT cDNA LIBRARIES TO CREATE xe2x80x9cMxe2x80x9d AND xe2x80x9cNxe2x80x9d POPULATIONS . . . 264
6.7. CONSTRUCTION OF YEAST STRAINS WITH INTEGRATED COPIES OF RAF-GAD . . . 264
6.8. CONSTRUCTION OF PEPTIDE EXPRESSION VECTORS (PEVs) . . . 266
6.9. SELECTION OF PROTEIN-PROTEIN INTERACTIONS FROM A NON-INTERACTING BACKGROUND . . . 267
6.10. SELECTION OF SPECIFIC PROTEIN-PROTEIN INTERACTIONS FROM A BACKGROUND OF OTHER INTERACTING PROTEINS . . . 272
6.11.SELECTION OF INTERACTING PROTEINS FROM AN Mxc3x97N SCREEN . . . 274
6.11.1. MATING ASSAY . . . 274
6.11.2. WHOLE CELL PCR OF THE POSITIVE COLONIES . . . 275
6.11.3. QEA(trademark) METHOD OF THE PCR PRODUCTS . . .275
6.11.4. CREATION OF TWO-DIMENSIONAL POOLS . . . 276
6.11.5. WHOLE CELL PCR OF THE POOLED CELLS . . . 276
6.11.6. QEA(trademark) METHOD OF THE PCR DERIVED FROM POOLED CULTURES . . . 277
6.11.7. THE SEQ-QEA(trademark) METHOD OF THE PCR DERIVED FROM POOLED CULTURES . . . 278
6.12. IDENTIFICATION OF SPECIFIC PAIRS OF INTERACTING PROTEINS FROM A QEA(trademark) METHOD OF THE INTERACTIVE POPULATION AND BY THE USE OF GENE-SPECIFIC PRIMERS . . . 279
6.13. CREATION OF INTERACTIVE GRIDS . . . 279
6.14. ISOLATION OF STAGE-SPECIFIC PAIRS OF INTERACTING PROTEINS . . . 280
6.15. EXPRESSION OF PEPTIDE INHIBITORS IN PEV AND INHIBITION OF PROTEIN-PROTEIN INTERACTIONS . . . 280
6.16. IDENTIFICATION OF CELLS CONTAINING AN INHIBITOR OF PROTEIN-PROTEIN
The present method relates to the identification of protein-protein interactions and inhibitors of these interactions that, preferably, are specific to a cell type, tissue type, stage of development, or disease state or stage.
Proteins and protein-protein interactions play a central role in the various essential biochemical processes. For example, these interactions are evident in the interaction of hormones with their respective receptors, in the intracellular end extracellular signaling events mediated by proteins, in enzyme substrate interactions, in intracellular protein trafficking, in the formation of complex structures like ribosomes, viral coat proteins, and filaments, and in antigen-antibody interactions. These interactions are usually facilitated by the interaction of small regions within the proteins that can fold independently of the rest of the protein. These independent units are called protein domains. Abnormal or disease states can be the direct result of aberrant protein-protein interactions. For example, oncoproteins can cause cancer by interacting with and activating proteins responsible for cell division. Protein-protein interactions are also central to the mechanism of a virus recognizing its receptor on the cell surface as a prelude to infection. Identification of domains that interact with each other not only leads to a broader understanding of protein-protein interactions, but also aids in the design of inhibitors of these interactions.
Protein-protein interactions have been studied by both biochemical and genetic methods. The biochemical methods are laborious and slow, often involving painstaking isolation, purification, sequencing and further biochemical characterization of the proteins being tested for interaction. As an alternative to the biochemical approaches, genetic approaches to detect protein-protein interactions have gained in popularity as these methods allow the rapid detection of the domains involved in protein-protein interactions.
An example of a genetic system to detect protein-protein interactions is the xe2x80x9cTwo-Hybridxe2x80x9d system to detect protein-protein interactions in the yeast Saccharomyces cerevisiae (Fields and Song, 1989, Nature 340:245-246; U.S. Pat. No. 5,283,173 by Fields and Song). This assay utilizes the reconstitution of a transcriptional activator like GAL4 (Johnston, 1987, Microbiol. Rev. 51:458-476) through the interaction of two protein domains that have been fused to the two functional units of the transcriptional activator: the DNA-binding domain and the activation domain. This is possible clue to the bipartite, nature of certain transcription factors like GAL4. Being characterized as bipartite signifies that the DNA-binding and activation functions reside in separate domains and can function in trans (Keegan et al., 1986, Science 231:699-704). The reconstitution of the transcriptional activator is monitored by the activation of a reporter gene like the lacZ gene that is under the influence of a promoter that contains a binding site (Upstream Activating Sequence or UAS) for the DNA-binding domain of the transcriptional activator. This method is most commonly used either to detect an interaction between two known proteins (Fields and Song, 1989, Nature 340:245-246) or to identify interacting proteins from a population that would bind to a known protein (Durfee et al., 1993, Genes Dev. 7:555-569; Gyuris et al., 1993, Cell 75:791-803; Harper et al., 1993, Cell 75:805-816; Vojtek et al., 1993, Cell 74:205-214).
Another system that is similar to the Two-Hybrid system is the xe2x80x9cInteraction-Trap systemxe2x80x9d devised by Brent and colleagues (Gyuris et al., 1993, Cell 75:791-803). This system is similar to the Two-Hybrid system except that it uses a LEU2 reporter gene and a lacZ reporter gene. Thus protein-protein interactions leading to the reconstitution of the transcriptional activator also allow cells to grow in media lacking leucine and enable them to express xcex2-galactosidase. The DNA-binding domain used in this system is the LexA DNA-binding domain, while the activator sequence is obtained from the B42 transcriptional activation domain (Ma and Ptashne, 1987, Cell 51:113-119). The promoters of the reporter genes contain LexA binding sequences and hence will be activated by the reconstitution of the transcriptional activator. Another feature of this system is that the gene encoding the DNA-binding domain fusion protein is under the influence of an inducible GAL promoter so that confirmatory tests can be performed under inducing and non-inducing conditions.
In yet another version of this system developed by Elledge and colleagues, the reporter genes HIS3 and lacZ (Durfee et al., 1993, Genes Dev. 7:555-569) are used. The transcriptional activator that is reconstituted in this case is GAL4 and protein-protein interactions allow cells to grow in media lacking histidine and containing 3-aminotriazole (3-AT) and to express xcex2-galactosidase. 3-AT inhibits the growth of his3 auxotrophs in media lacking histidine (Kishore and Shah, 1988, Annu. Rev. Biochem. 57:627-663).
In a different two-hybrid assay, a URA3 reporter gene under the control of Estrogen Response Elements (ERE) has been used to monitor protein-protein interactions. Here, the DNA-binding domain is derived from the human estrogen receptor. The authors of the ERE assay propose that inhibition of the protein-protein interactions can be identified by negative selection on 5-FOA medium (Le Douarin et al., 1995, Nucleic Acids Res. 23:876-878), but do not provide any details.
A version of the two-hybrid approach called the xe2x80x9cContingent Replication Assayxe2x80x9d that is applicable in mammalian cells has also been reported (Nallur et al., 1993, Nucleic Acids Res. 21:3867-3873; Vasavada et al., 1991, Proc. Natl. Acad. Sci. USA 88:10686-10690). In this case, the reconstitution of the transcription factor in mammalian cells due to the interaction of the two fusion proteins leads to the activation of the SV40 T antigen. This antigen allows the replication of the activation domain fusion plasmids. Another modification of the two-hybrid approach using mammalian cells is the xe2x80x9cKaryoplasmic Interaction Selection Strategyxe2x80x9d that also uses the reconstitution of a transcriptional activator (Fearon et al., 1992, Proc. Natl. Acad. Sci. USA 89:7958-7962). Reporter genes used in this case have included the gene encoding the bacterial chloramphenicol acetyl transferase, the gene for cell-surface antigen CD4, and the gene encoding resistance to Hygromycin B. In both of the mammalian systems, the transcription factor that is reconstituted is a hybrid transcriptional activator in which the DNA-binding domain is from GAL4 and the activation domain is from VP16.
In all of the assays described above, the identity of one (or both) of the proteins being tested for interaction is known. All of the assays mentioned above can be used to identify novel proteins that interact with a known protein of interest. In a variation of the xe2x80x9cInteraction Trapxe2x80x9d system, a xe2x80x9cmating-gridxe2x80x9d strategy has been used to characterize interactions between proteins that are thought to be involved in the Drosophila cell cycle (Finley and Brent, 1994, Proc. Natl. Acad. Sci. USA 91:12980-12984). This strategy is based on a technique first established by Rothstein and colleagues (Bendixen et al., 1994, Nucleic Acids Res. 22:1778-1779) who used a yeast-mating assay to detect protein-protein interactions. Here, the DNA-binding and activation domain fusion proteins were expressed in two different haploid yeast strains, a and xcex1, and the two were brought together by mating. Thus, interactions between proteins can be studied in this method. However, even in this method, the identities of at least one of the proteins in the interacting pairs of proteins was known prior to analyzing the interactions between pairs of proteins.
Stanley Fields and coworkers have recently performed an analysis of all possible protein-protein interactions that can take place in the E. coli bacteriophage T7 (Bartel et al., 1996, Nature Genet. 12:72-77). Randomly sheared fragments of T7 DNA were used to make libraries in both the DNA-binding domain and the activation domain plasmids and a genome-wide two-hybrid assay was performed by use of a mating strategy. The DNA-binding and the activation domain fusions were transformed into separate yeast strains of opposite mating type. The DNA-binding domain hybrids containing yeast transformants were then divided into groups of 10. The groups were screened (by the mating strategy outlined above) against a library of activation domain hybrids numbering around 105 transformants. By this method, interactions were characterized among the proteins of T7. While this study provides a method to screen more than one DNA-binding domain hybrid against more than one activation domain hybrid, it does not address the issues involved in screening complex libraries against each other. This is an important limitation due to the value of enabling the detection and isolation of interactants from cDNA libraries prepared from complex organisms like human beings. Indeed, the prior art has taught away from using complex populations of proteins as hybrids to the DNA-binding domain, since random hybrids to the DNA binding domain produce a large percentage of false positives (hybrids that have transcriptional activity in the absence of an interacting protein) (Bartel et al., 1993, xe2x80x9cUsing the two hybrid system to detect protein-protein interactions,xe2x80x9d in Cellular Transduction in Development, Ch. 7, Hartley, D. A. (ed.), Practical Approach Series xviii, IRL Press at Oxford University Press, New York, N.Y., pp. 154-179 at 171; Ma and Ptashne, 1987, Cell 51:113).
None of the prior art systems provides a method that not only isolates and catalogues all possible protein-protein interactions within a population, be it a tissue/cell-type, disease state, or stage of development, but also allows the comparison of such interactions between two such populations thereby allowing the identification of protein-protein interactions unique to any particular tissue/cell-type, disease state, or stage of development. In contrast, such a method is provided by the present invention.
Accordingly, it is one of the objectives of this invention to devise a genetic method to identify and isolate preferably all possible protein-protein interactions within a population of proteins, or between two different populations of proteins, be it a tissue/cell-type, disease state or stage of development.
It is another objective of the present invention to perform a comparative analysis of the protein-protein interactions that occur two or more different tissue/cell-types, disease states, or stages of development.
It is also an objective of this invention to identify and isolate in a rapid manner the genes encoding the proteins involved in interactions that are specific to a tissue/cell-type, disease state, or stage of development.
It is yet another objective of this invention to provide a method for the concurrent identification of inhibitors of the protein-protein interactions that characterize a given population, be it a tissue/cell type, disease state, or stage of development. These inhibitors may have therapeutic value.
Citation of a reference herein shall not be construed as an admission that such is prior art to the present invention.
The present invention provides methods and means to detect and isolate the genes encoding the proteins that interact with each other between two populations of proteins, using the reconstitution of a selectable event. This selectable event is the formation of a transcription factor. In contrast to the prior art, in which problems with false positives and low throughput limited the complexity of the populations that could be analyzed, each of the two populations of proteins has a complexity of greater than 10, and preferably has a complexity of at least 1,000. The reconstitution of a transcription factor occurs by interaction of fusion proteins expressed by chimeric genes. In a preferred embodiment, the types of fusion proteins used are DNA-binding domain hybrids and activation domain hybrids of transcriptional activators. Libraries of genes encoding hybrid proteins are preferably constructed in both a DNA-binding domain hybrid plasmid vector and in an activation domain hybrid plasmid vector. In a preferred embodiment, two types of haploid yeast strains, a and a respectively, are each transformed with a different one of the two libraries to create two yeast libraries. The two yeast libraries are then mated together to create a diploid yeast strain that contains both the kinds of fusion genes encoding the hybrid proteins. If the two hybrid proteins can interact (bind) with each other, the transcriptional activator is reconstituted due to the proximity of the DNA-binding and the activation domains of the transcriptional activator. This reconstitution causes transcription of reporter genes that, by way of example, enable the yeast to grow in selective media. In a preferred aspect, the activity of a reporter gene is monitored enzymatically. The isolation of the plasmids that encode these fusion genes leads to the identification of the genes that encode proteins that interact with each other.
Thus, in a specific embodiment, the invention is directed to a method of detecting one or more protein-protein interactions comprising (a) recombinantly expressing within a population of host cells (i) a first population of first fusion proteins, each said first fusion protein comprising a first protein sequence and a DNA binding domain in which the DNA binding domain is the same in each said first fusion protein, and in which said first population of first fusion proteins has a complexity of at least 1,000; and (ii) a second population of second fusion proteins, each said second fusion protein comprising a second protein sequence and a transcriptional regulatory domain of a transcriptional regulator, in which the transcriptional regulatory domain is the same in each said second fusion protein, such that a first fusion protein is co-expressed with a second fusion protein in host cells, and wherein said host cells contain at least one nucleotide sequence operably linked to a promoter driven by one or more DNA binding sites recognized by said DNA binding domain such that interaction of a first fusion protein with a second fusion protein results in regulation of transcription of said at least one nucleotide sequence by said regulatory domain, and in which said second population of second fusion proteins has a complexity of at least 1,000; and (b) detecting said regulation of transcription of said at least one nucleotide sequence, thereby detecting an interaction between a first fusion protein and a second fusion protein.
In further specific embodiments, this invention provides for detecting experimentally significant protein-protein interactions between highly complex libraries of proteins. In particular, the invention provides protocols which achieve highly effective screening of the DNA binding domain or activation domain hybrids to eliminate those hybrids that produce false positive indications of protein-protein interactions. Additional screening protocols eliminate those hybrids which, due to non-specific association with many proteins, produce less experimentally significant or specific indications of protein-protein interactions. Further protocols provide for the efficient mating of large numbers of yeast cells useful for handling complex libraries.
The present invention also provides a method to isolate concurrently inhibitors of such protein-protein interactions that occur in, are characteristic of or are specific to a given population of proteins. By way of example, preferably all the yeast diploids that harbor fusion proteins that inter-act with each other are pooled together and exposed to candidate inhibitors. Exemplary candidate inhibitors include chemically synthesized molecules and genetically encoded peptides. After treatment with candidate inhibitors, the yeast cells harboring interacting hybrid proteins are selected for the inactivation of the reporter gene, preferably by transfer to appropriate selective media. Preferably, the same media also selects for the presence of the plasmids that encode the interacting proteins, and the peptide-encoding peptides in the case of the screening for peptide inhibitors; expressed from expression plasmids. Successful inhibition events are thus monitored by the inactivation of the reporter gene.
The major advantages of these methods are as follows. From a population of proteins characteristic of a particular tissue or cell-type, all possible detectable protein-protein interactions that occur can be identified and the genes encoding these proteins can be isolated. Thus, parallel analyses of two cell types enumerates the protein-protein interactions that are common to both and those that are specific to both (differentially expressed in one cell type and not the other). Such an analysis has value since protein-protein interactions specific to a disease state can serve as therapeutic points of intervention.
Furthermore, inhibitors of such protein-protein interactions can be isolated in a rapid fashion. Such inhibitors can be of therapeutic value or serve as lead compounds for the synthesis of therapeutic compounds. This system can also be used to identify novel peptide inhibitors of protein-protein interactions. One advantage of this method over existing methods is that peptides or chemicals are identified by an ability to block protein-protein interactions. In many existing methods, molecules are identified by an ability only to bind to one of a pair of interacting proteins; such binding does not necessarily imply that the protein-protein interaction will be blocked by the same agent. Another advantage of the method is that multiple protein-protein interactions can be screened against a prospective inhibitor in a single assay.
This invention also provides information-processing methods and systems. One aspect of these methods provides methods for interpreting detected protein-protein interactions by providing for identification of the genes that code for the library inserts in the activation domain and fusion domain hybrids. Another aspect of these methods provides for assembling protein-protein interaction data detected from one or more pairs of libraries into a unified database. Further aspects of these methods provide for use of this unified database to assemble individual, pair-wise protein-protein interactions into putative pathways and networks of protein interaction, providing a more general view of cellular functioning. Also provided for is the use of this unified database to delimit or determine the protein domains responsible for particular protein-protein interactions.