Protein-protein interactions underlie virtually every process in a living cell, such as signal transduction, cytoskeletal organization, virus-host cell recognition, assembly of multiprotein complexes, and many more. Consequently, the exploration PPIs of entire PPI networks is a formidable goal in systems biology. The establishment of high-confidence PPIs is crucial for the understanding of diseases and can provide the basis for new therapeutic approaches.
Yeast Two-Hybrid (Y2H) is a sensitive and widely applied method to screen for protein-protein interactions (PPIs) and was instrumental in the detection of many biological processes and disease mechanisms (Ratushny and Golemis 2008; Hamdi and Colas 2012). Exploiting the power of yeast genetics, reconstituted PPIs can be selected out of a large number of potential combinations by selective growth of yeast cells. Specifically, reconstituted PPIs drive reporter gene activation, which allows the selection of a few yeast clones that express interacting proteins against a large background of cells with noninteracting proteins. In the conventional scheme of Y2H′, a transcription factor is split into its DNA binding and activation domains (DBD and AD)′, and functionally reconstituted via the physical interaction of fused bait and prey proteins (FIG. 1). While in the original implementation of the system′, the transcription factor was the Gal4 protein′, lexA based systems are applied as an alternative (Fields and Song; Paroush, Finley et al. 1994; Golemis, Serebriiskii et al. 2009). The reconstituted transcription factor drives the expression of reporter genes that is scored by growth (typically HIS3 and lacZ). Bait and prey vectors contain promoters that regulate transcription of the fusion genes and marker genes (typically TRP1 and LEU2). Y2H assays hence often use selective medium lacking leucine, tryptophan, and histidine. Selected bait-prey combinations grow as colonies on the selective plates and are identified by DNA sequencing. Using conventional sequencing technology (Sanger sequencing), the identification of the screening results is major limitation for the scale and throughput of Y2H.
In the traditional setup of a cDNA library based Y2H, a specific bait construct of interest is combined with a cDNA library of prey fusions. This conventional procedure applies a rather laborious effort with a considerable amount of consumables and is usually only performed in a single replicate. An important alternative to cDNA based Y2H screens are matrix-based Y2H approaches. In these experiments, annotated and assembled open reading frames (ORFs) in bait and prey strains are combined one-by-one using automated procedures (Uetz, Giot et al. 2000; Stelzl, Worm et al. 2005). Notably, such approaches with preassembled ORF libraries were applied for comprehensive PPI analysis of in eukaryotic model organisms, such as yeast (Uetz, Giot et al. 2000; Ito, Chiba et al. 2001; Giot, Bader et al. 2003; Yu, Braun et al. 2008), and importantly also for a first overview of the human interactome (Rual, Venkatesan et al. 2005; Stelzl, Worm et al. 2005). While such matrix-based systems eliminate the need to do sequencing reactions for the identification of proper bait-prey interactions, their use is restricted to annotated and cloned collections of open reading frames.
Despite their popularity, Y2H interaction screening assays generate false positive results that are caused by artifacts and erroneous reporter gene activation (LaCount, Vignali et al. 2005). Activation of reporter gene transcription in the absence of a functional bait or prey (self- or autoactivation) is common in Y2H screens, and in certain screening setups the backgrounds of unspecific and erroneous results is up to 90%. In conventional library screens, Y2H interactions are usually confirmed by isolation of the interacting prey clone and retransformation into fresh cells, followed by retesting the interaction with fresh original bait and proper controls (Walhout and Vidal 2001). Reporter activation should only be observed with the correct bait-prey combination, while reporter with prey and a control (e.g. empty bait vector, irrelevant protein) displays a false positive or unspecific interaction. False positives that can be addressed this way are also coined as technical false positives that are caused by the inherent properties of the underlying yeast screening system (Vidalain, Boxem et al. 2004). Typically, unspecific activation by preys occurs by a genuine interaction of the prey protein with the DNA binding domain encoded by the bait vector or some component of the yeast transcription machinery. Unspecific activation by baits occurs often when the bait protein is a transcription factor or has some transcription factor-like properties. Notably, false activation (autoactivation) is more frequent with bait (ca. 20%) than with prey proteins (expected ca. 5%). In a conventional screening setup, however, in which individual baits are screened with complex libraries of prey proteins, bait autoactivators provide a lesser problem since they can be tested prior to the screen. On the other hand, the prey libraries are very complex with >3×106 prey clones/library. Hence the occurrence of false positives in prey cDNA libraries is often very difficult to predict and control, especially when relatively few assays are undertaken.
Four relevant criteria have been established for a proper execution for high-throughput Y2H (Yu, Braun et al. 2008; Simonis, Rual et al. 2009; Venkatesan, Rual et al. 2009). First, the number of physical protein pairs (bait-prey) that are being tested in a given search space (completeness). More specifically, this is the total of all possible combinations between the bait being tested in the screen and the preys that are physically present in the particular cDNA library. The second criterion is assay sensitivity, measuring potential interactions that can and cannot be detected in a particular setup. Assay sensitivity in the Y2H setup can be restricted by the physical inaccessibility of a particular domain for the interaction. Systematic false negatives caused by steric exclusions or domain folding have to be addressed with different bait fragments, or swapping the orientation of the bait fusion construct (Rajagopala, Hughes et al. 2009). The third criterion is sampling sensitivity: the fraction of all detectable interactions found by a single implementation of the assay; and fourth is precision, which is the proportion of true versus false positives detected in the assay.
So far, Y2H results, also those generated in high-throughput experiments are not based on truly quantitative measurements. This contrasts with gene expression and protein-DNA interactions. For this, DNA microarrays were instrumental. The application of this technology vastly increased throughput and quantitative analysis of expression and interaction profiles generated a vast amount of biological insight (Allison, Cui et al. 2006). However, recent years saw the emergence of next-generation sequencing (NGS) technologies becoming dominant methodologies for many applications in genomics and systems biology, replacing DNA microarrays in applications such as transcriptome analysis (RNA-Sequencing), chromatin immunoprecipiation (Chip-Seq), and diverse other assays (Metzker 2005; Johnson, Mortazavi et al. 2007; Morozova and Marra 2008; Fox, Filichkin et al. 2009; Metzker 2010). Importantly, NGS approaches allow an inexpensive production of very large volumes of sequence data. For transcriptome analysis, sequencing of the total cellular RNA content has two major advantages over DNA microarray based analysis: the more precise readout over a larger dynamic range and the possibility to identify novel and previously unannotated transcripts. The most widely used system for NGS is by now the Illumina systems, which was widely used for transcriptome studies by many researchers (Marioni, Mason et al. 2008; Lazarevic, Whiteson et al. 2009; Filichkin, Priest et al. 2010; Levin, Yassour et al. 2010; Alsford, Turner et al. 2011; Zhang, Cheranova et al. 2012).
Lewis and Wan recently reported a global pooling scheme and screening scheme followed by a selective readout by the Illumina next-generation sequencing system (Lewis, Wan et al. 2012). Quantitative Interactor Screen Sequencing (QIS-Seq). This method applied amplification from selected pools of a conventional cDNA prey library. Additionally, it requires transformation of the entire prey library for individual baits. In effect, a new pool is generated for each screen. Because de novo transformation of a library for each individual bait and control is laborious and time consuming, this method is ineffective for commercial purpose. Therefore a method is needed to efficiently analyze the prey library pool. Only with increased analysis efficiency, it will be possible to run the analysis in multiples to obtain statistically increased accuracy.