Cellular architecture is defined by its complexes, the molecular machines that actually make a cell. Cell biology traditionally identifies proteins based on their individual actions as catalysts, signalling molecules, or building blocks of cells and microorganisms. Currently, we witness the emergence of a post-genomic view that expands the protein's role, regarding it as an element in a network of protein-protein interactions as well, with a ‘contextual’ or ‘cellular’ function within functional modules.
The qualitative and quantitative characterization of complex protein-protein networks and the identification of major cell type specific interacting proteins are paramount to understanding the physiological processes and alterations of protein-protein interactions in a multitude of human diseases such as cancer, autoimmune diseases and other disorders. Detailed insights in protein-protein networks and the identification of disease-associated differences may lead to new ways for the rational design and development of specific drugs. The pattern of protein-protein interactions in a cell or tissue may also be used as a tool for molecular diagnostics.
Proteins participate in complex interactions that represent the mechanistic foundation for much of the physiology and function of the cell. These protein-protein interactions are organized into exquisitely complex networks. The architecture of protein-protein interaction networks was proposed to be scale-free, with most of the proteins having only one or two connections but with relatively fewer ‘hubs’ possessing tens, hundreds or more links. The interaction networks are highly dynamic, allowing for rapid changes in the interactome, for example to external stimuli or even developmental processes. Interactions between core proteins and between two or more module proteins are likely to be mediated by domain-domain interactions. Interactions within and between attachment proteins are less likely to occur in this manner. Despite the contribution of protein complexes and interactions to the regulation and execution of biological processes, relatively few complexes are well-understood in terms of structure and function.
Attempts to experimentally obtain kinetic constants for cellular interactions are sparse. These quantitative parameters will enable the development of differential equation-based kinetic models of cellular processes. Such models are necessary for the understanding of drug action and will promote the discovery of new drugs for many complex diseases. The development of quantitative multi-scale models can provide a theoretical understanding of the therapeutic action and adverse effects of drugs at a cellular level.
The term ‘sampling’ is used for experimental designs where only a subset of the population is interrogated. Representative sampling is not common in the generation of protein interaction datasets, where sampling has often been guided by biological priorities. The ‘coverage’ summarizes which part of the total set of possible interactions has actually been tested. In light of current technologies, it is not valid to make inferences about the ‘interactome’, e.g. the set of all physical interactions that take place in a cell under the conditions being studied.
Several methods have been devised to study protein-protein interaction including physical methods to select and detect proteins that bind another protein, such as protein affinity chromatography, affinity blotting, immunoprecipitation (including 2D gel electrophoresis and mass spectrometry), cross-linking; library-based methods: protein probing, phage display, two-hybrid system, other library-based methods and genetic methods: extragenic suppressors, synthetic lethal effects, overproduction phenotypes, overproduction of wild-type proteins and overproduction of mutant proteins; and unlinked non-complementation.
Many of these methods are not suited for high throughput protein-interaction analysis. The most promising high throughput technologies are available by the development of peptide- and protein-library screening techniques such as the yeast two-hybrid strategy, which is a method to identify and clone genes for proteins that interact with a protein of interest; two-hybrid arrays, where large-scale experiments are carried out in a colony-array format, in which each yeast colony expresses a defined pair of ‘bait’ and ‘prey’ proteins that can be scored for reporter gene activity—indicating interaction—in an automated manner; phage display where a library of proteins is panned against a “bait” protein and affinity-purification/mass-spectrometry (AP-MS), especially to define all complexes in the cell (the ‘complexome’) and their constituent proteins; and tandem affinity purification (TAP). TAP reveals interacting proteins as core, module, or attachment proteins, according to the frequency of their appearance in the various forms of that complex.
All of these methods have advantages and disadvantages related to the reliability, completeness and ease of information gained by using of these techniques. The ideal method captures the information of interactome in a time and cost effective manner, enabling random sampling and high redundancy of sampling. It provides dynamic, original cellular context based, native protein-protein interaction based, and comprehensive, sufficiently large coverage of quantitative interaction data of even large, multi-unit protein complexes. It suppresses the effects of random variables, such as detecting of non-specific, accidentally interacting proteins. It, also, diminishes the effect of variables, which are any binding event related variables involved in the detection principle other than the original protein-protein interaction.
Two-hybrid screens, especially the array based techniques, enable large scale interactome information generation. However there are major disadvantages due to their binary, pair wise detection, lack of the original context based dynamic information, artificial binding agent (hybrid proteins) and the yeast cellular context restricted principle (e.g. skewed post-translational modification compared to the original host). Almost all of these have been solved partly by various ways. However, a method, which combines all of these required features has not been devised.
Affinity based methods, especially those using mass spectrometry as the detection principle, generate a high amount of semi-quantitative interactome data, partly in the correct cellular context. However they are influenced by random and binding (affinity) related variables. They detect accidental, non-specific binding events. To generate a random sampled, high coverage, comprehensive dataset would require a significant amount of time and expense, which compromises the benefit of its potential to detect the dynamic nature of interactome. Some of these issues have been solved, especially using tandem affinity purification (TAP), where accidental, non-specific binding events are reduced to a minimum, however at the expense of less reliable protein-protein complex recovery.
These techniques have accelerated the generation of protein-protein interaction (PPI) data on a large scale. After the pioneering study on the interactome, several large-scale studies have been carried out resulting in some high quality datasets of pair wise protein-protein interactions. For instance, the filtered yeast interactome (FYI) is an intersection of different datasets, including Y2H data, AP-MS data, in silico predictions, Munich Information Centre for Protein Sequences physical interactions, and protein complexes reported in the literature.
As the existing methodological approaches do not fully meet the needs of protein-protein interaction and interactome studies, new methods for the analysis and characterizations of complex protein-protein networks are needed.
The present invention provides methods and kits for detecting binding interactions, in particular protein-protein interactions at the cellular level. The methods and kits can be used for simultaneously detecting all, or a subset of, interacting proteins in complex protein networks, preferably in the original context of cells. The methods and kits provide dynamic, original cellular context based, native protein-protein interaction based, and comprehensive, sufficiently large coverage of quantitative and potentially kinetic interaction data of even large, multi-unit protein complexes.
The invention can be used for detecting protein-protein interactions using antibody display technology, using a plurality of antibody phages as the binding agents. The invention can also be used for detecting protein-protein interactions using aptamer technology, using a plurality of aptamers as binding agents. The complexity of the plurality of binding agents can be varied in wide ranges between a few binding agents to tens of thousands or hundreds of thousands or millions or tens of millions or hundreds of millions of binding agents. To obtain low complexity binding agents from high complexity binding agents suitable for the invented method, a complexity reduction method is devised (enrichment).
More detailed interactions between target molecules can be identified and monitored. For example protein-protein interactions can be detected. The presence of two or more binding agents within a binding agent/target complex may indicate that two or more targets may be present within the complex. This indicates that the two or more targets may be interacting with, or bound to, each other. If an identifiable part of the specific binding agent is known, for example the protein or nucleic acid sequence, then the targets can be identified. This method can be carried out using highly parallel PCR amplification by linking the identifiable nucleic acid sequences of bound displayed antibody phages i.e. those with predetermined binding characteristics e.g. with known epitope sequences, or known to bind to a specific molecule. This can be done preferably by emulsion PCR. This may be carried out at low protein complex concentrations, preferably in compartments. The interactions between targets e.g. protein-protein interactions can be detected by highly parallel PCR amplification, preferably using reduced complexity binding detection agents. The target-target e.g. protein-protein interaction information is gained by sequencing of the linked identifiable sequences, preferably by highly parallel DNA sequencing or by other sequence detection means. Varying the amount of input material e.g. the target, can be used to collect ligand binding kinetics data. In addition the method can be carried out in the presence and absence of compounds to determine whether the compounds have any effect on the target interaction, and whether this effect is agonistic or antagonistic.
The invention can also use protein display technology, displaying protein fragments of an organism and determining the binding characteristics of a multitude of displayed antibodies, each antibody having unique identifiable sequence information and each displayed protein fragments having identifiable sequence information. Preferably the identifiable sequence information for the displayed protein fragments is the sequence encoding the displayed amino acid sequence. The identity of the bound antibodies can be determined from the identifiable sequence information for each antibody-protein complex. Optionally the identity of the bound protein fragment, within each antibody-protein complex can be identified. Optionally the identity of the bound antibodies and the identity of the bound protein fragment can be determined from the linked identifiable sequence information for each antibody-protein complex. The binding, kinetic characteristics can also be determined using different amounts of the target e.g. proteins and binding agents such as, displayed proteins or display antibodies.
The methods and compositions of the invention may also be used to identify compounds which may agonize or antagonize such protein-protein interactions. The present invention provides methods and kits for detecting binding interactions with antagonistic (disrupting) or agonistic (promoting) compounds. The invention provides methods and kits for simultaneously detecting the binding interactions of antagonistic and/or agonistic compounds in complex protein networks, preferably in the original context of cells. The methods and kits provide original cellular context based, native protein-protein interaction based data, which is comprehensive, and has sufficiently large coverage of both quantitative and, potentially, kinetic interaction data, even for large, multi-unit protein complexes.