To identify molecular networks, information concerning at least two molecular species must be gathered since there are at least two molecules involved in any interaction. Current high throughput techniques generally resolve one of the interaction partners in a high throughput fashion while the other interaction partner is limited to much fewer species. There is a need for methods which permit high throughput screening and identification of all partners of an interaction.
The human genome contains approximately 30,000 genes, disregarding splice variants and post translational modifications this corresponds approximately to 30,000 proteins and 4.5*108 possible protein pairs. In general, if there are n molecules of interest in the library, then there exists (n2+n)/2 possible interaction pairs in the network. The capability of investigating vast libraries and networks like these would be greatly enhanced by the development of molecular interaction-based methods wherein high throughput information may be generated for both partners of the interaction.
To analyze complex libraries a readout platform which enables data acquisition in high throughput fashion is required. Protein analysis via micro-array techniques can be achieved by converting information about protein interactions into nucleic acid-based information, which is highly amenable to analysis with microarrays. Microarray analysis is a conventional platform which enables cost effective data generation in a high throughput manner. A method of protein interaction analysis which is successfully adapted for microarray analysis is highly desirable. The present inventors have developed such an adaptation wherein molecular interactions are detected by detecting a combination of at least two nucleic acid tags, one from each molecule, wherein at least one such combination is required for signal output in one microarray feature.
Several other approaches exist in the art which can be utilized for interaction screening, but which differ from the present inventive methods in significant ways.
Protein microarrays use micro-scale, spatial localization for identification of different molecules. The format allows detection of molecules in high throughput if target molecules in solution are investigated by probes organized on a solid phase. Targets can be labelled with a detectable function and the presence of the detectable function in one array feature reflects the presence of the target in the sample, assuming that no cross reactivity between probes are present. Examples of this approach are expression arrays and antibody arrays. The assumption when using these arrays is that each target in solution only displays affinity to one defined probe on the microarray. When investigating molecular interactions the affinities are not known per definition. To identify molecular interactions with regular microarray analysis one detectable function is required for each target in solution.
The number of detectable functions which can be resolved then becomes limiting. Generally only one or two detectable functions are used although there are approaches which potentially could resolve more they are far away from the number of molecules which can be identified by position in the array. This limits the capacity of information extraction per experiment to the number of detectable functions multiplied by the library size, e.g. if a library has 500 members and the approach can detect two detectable functions 250 arrays need to be performed to extract all possible interaction information. Further on, the microarray platform suffers from the disadvantage that one partner of the protein pair needs to be immobilized on a solid phase which potentially can disturb the interaction and/or the protein conformation. While protein microarrays permit analysis of one member of an interaction in high throughput and can provide information about interactions, the data extraction is cost and labour effective. The analysis is performed on a solid phase and interaction studies are limited by the number of resolvable detectable functions which also limits the possibility of inter- or intra-library interaction screens.
Another known method is the yeast two hybrid system which utilizes transcription factors to investigate protein interactions. Many transcription factors contain two distinct functional regions, one which binds a specific DNA sequence and one which recruits the transcription machinery to activate a proximal gene. Yeast two hybrid systems utilize these features to investigate protein affinity interactions. One protein or protein library is fused with the DNA binding domain forming the “bait” and a second protein or protein library is joined to the activating domain forming the “prey”. The fusions are constructed on the DNA level and the two constructs are then co-expressed in yeast. If two proteins which have affinity for each other are expressed in the same cell they will activate transcription. Affinity interaction can then be detected by reporter gene activation due to reconstitution of the transcription factor. The two major approaches for yeast two hybrid interaction analysis is the “array approach” and the “exhaustive screening” approach. In the array approach yeast clones expressing different baits are ordered in an array so that the identity of the protein in each clone can be deducted from position. The array is then mated with yeast expressing one prey protein of interest. The mated clones are then cultured on selective media and protein interactions can be identified without cloning and sequencing from the position of positive clones on the array. The exhaustive screening approach mate one bait clone with a library of prey proteins, the mated library is then cultured on selective media and positive clones are sequenced to identify which prey proteins have interacted with the bait. The major drawback of the exhaustive screening approach is the time and cost of sequencing all the positive clones, which combined with the high intrinsic level of false positives increase the time and cost dramatically, compared to the array approach. Hence, library-library screens are generally not considered.
While the yeast two hybrid systems are advantageous in that they are genetic systems and thus do not require protein synthesis, and they can perform high throughput of one of the interaction partners, sequencing of positive clones is necessary for high throughput library screening approaches, which, combined with an intrinsic high frequency of false positives becomes problematic.
Display techniques such as phage display, ribosome display, RNA display, SELEX, cis display and covalent display are used to isolate affinity binders from libraries. These techniques are based on a large library of potential interaction partners (binders) which are allowed to interact with one partner (target). The target is typically immobilized on a solid phase and then allowed to interact with the library of binders. Members of the library which do not bind to the target are discarded and bound members are regenerated. Then the procedure is iterated until only one or a few library members remain. These are then analysed for affinity towards the target. The different display techniques primarily differ in what molecule type is used to create the library and how the binders are regenerated. Display techniques can be used to identify affinity interactions but these are generally limited to a low number of target molecules. Some display techniques like phage display have the potential to identify interactions between libraries but the analysis of the interactions are done by cloning and sequencing which is very time consuming and labour intensive. A majority of the display techniques like e.g phage display are genetic systems and do not easily adapt to other molecules than proteins.
To enable selection of binders based on libraries other than peptide, protein or nucleic acid, libraries, DNA templated synthesis (DTS) approaches has been developed. In this technique, nucleic acids can be attached to low molecular compounds in order to establish identification of the compound after selection of a library for a desired property. If two compounds are attached to nucleic acids which are complementary, hybridization of these nucleic acids will force the two compounds in proximity and elevate the relative local concentration of the compounds. Thus, the two compounds will be more prone to react than if they were not associated. This approach has also been utilized for multi step reactions, where libraries of compounds joined to nucleic acid tags are directed to serially react with a compound attached to a template nucleic acid. The synthesized library can then be screened with respect to a desired property and selected compounds can then be identified. Further on, the identity of compounds synthesized by DTS can be identified by microarray hybridization of the nucleic acid template which directed the synthesis and to which the product is also attached.
In DTS, two molecules are forced together by using nucleic acids to allow the molecules to interact to a higher degree than if they were not brought in proximity. For example, in the DTS-based method taught by Kanan et. al (Kanan et. al Nature 2004), one of the molecules forced together is attached to a “template” nucleic acid which specifically directs the second compound by direct hybridization of the nucleic acid attached to the second molecule. Thus the “template” nucleic acid contains both tags used to identify the compounds which are forced together. To create interactions between all members of a library with n members with this approach, the number of “template” oligonucleotides which have to be synthesized are in the range of n2/2, since one specific oligonucleotide has to be synthesized for each individual interaction on the library.
However, the present invention provides a method whereby the nucleic acid encoding the tags to identify both members of the interaction are formed upon joining of the two molecules. Hence, to investigate all interactions in a library of n members the number of oligonucleotides required is in the order of n. The combination of molecules in DTS is not combinatorial in the sense that all library members potentially may interact. It becomes combinatorial due to the presence of all specific pairs DNA templates which in turn direct the interactions. If the specificity of these interactions fails, the link between the compound phenotype and the nucleic acid genotype also fails. Thus, in the DTS approach, the link between nucleic acid identification tag and the molecules in the interaction depend on correct hybridization of one tag molecule to the template nucleic acid. If any cross hybridization occurs then the link between the tag and the identity of the interaction is unreliable.
In the current invention the nucleic acid formed by the NAM association encodes the combination of identification sequences. Therefore the link between the identification sequences in the associated nucleic acid molecule and the phenotype of the molecule is more effectively maintained.
The utilization of combinatorial association of different nucleic acid tags instead of pre-synthesis of all different combinations also provides further advantages. The amplification of the nucleic acid can be performed so that only nucleic acids which have been joined to create a pair of molecules are amplified by, e.g., introducing one primer site on each arm. Nucleic acids which remain unassociated will thus not be amplified. In the DTS approach all template nucleic acids can serve as PCR templates even though they have no interaction partner.
A method referred to as “proximity ligation” has been recently described (International Patent Application Ser. No. WO0161037, U.S. Patent Application Ser. No. US2002051986). According to this method, targets are detected by utilizing two or more binders, e.g. antibodies, with known affinity to the target. The method is based upon the co-localization of the binder pair in the presence of a target. This target brings the binders into proximity, enabling ligation of nucleic acids located on the binder pair. Thus the joining of the nucleic acid becomes elevated in the presence of the target molecule. The nucleic acid can subsequently be quantified and the amount of nucleic acid corresponds to the amount of target.
The primary embodiment of WO0161037 and US2002051986 aims at detection of a defined target. Thus all the affinities in the system are known and there is always at least one molecule, the target, which does not have a nucleic acid attached to it. Moreover, the invention utilizes two or more binders which bind their unlabelled analyte pair-wise in a predefined manner. Several pairs might be used in the same reaction but they are always analyzed in a target specific pair wise fashion, not in a combinatorial fashion.
The current inventive method differs from proximity ligation in that it does not detect or quantify a predefined target in a sample. The present inventive methods do not utilize molecules with predefined affinities but rather the reverse; the inter-molecular affinities are retrieved as a result of the inventive method.
In the current invention the association of the nucleic acid is combinatorial and the novel nucleic acid produced by this association is then identified to yield information concerning the molecular interactions in the library. This enables interrogation of all intra- or inter-library member interactions. The proximity ligation methods always include at least one molecule which is not attached to a nucleic acid label and this “target” is the molecule which is detected. Thus the co-localization/proximity of the binders and thereby the nucleic acid in the assay arises from the presence of a target molecule.
Finally the information gained from the inventions differs significantly. In the present inventive methods, combinations of interacting molecules may be detected and quantified. The proximity ligation art, on the other hand, teaches high throughput screening of inhibitors for a defined binding event. For example, a protein affinity interaction is identified first by some other method and proximity ligation is employed to find an inhibitor for the known affinity interaction. The proximity ligation screening approach involves attaching nucleic acids to the two proteins which participate in the predefined interaction. Then different potential inhibitors are added to the reaction. These potential inhibitors are not labelled with nucleic acids and their action is monitored by observation of a reduced signal from the two labelled proteins. The pre-defined affinity reagents are labelled with nucleic acids and a pre-defined pair of nucleic acids is analyzed.
There is a need in the art for methods capable of investigating affinity and/or functional interactions within libraries of molecules wherein such interactions are not previously established to exist, and methods for quantification of any detected interactions.