Nucleic acid binding proteins, namely DNA-binding proteins and RNA-binding proteins, are proteins that bind to either deoxyribonucleic acid (DNA) or ribonucleic acid (RNA). The binding can be non-specific, specific for a particular recognition site, or specific for a plurality of recognition sites, with recognition sites consisting of a specific recognition sequence of DNA or RNA. Examples of DNA-binding proteins are transcription factors, polymerases, nucleases, and histones. These proteins perform such functions as regulating transcription, cleaving DNA, and packing DNA into nucleosomes. Variation in these functions, such as the regulation of transcription by transcription factors, is believed to be responsible for many genetic differences between individuals that lead to phenotypic differences. (See, e.g., Kasowski et al., “Variation in Transcription Factor Binding Among Humans,” Science, 328: 232-235 (2010); and Zheng et al., “Genetic analysis of variation in transcription factor binding in yeast,” Nature, 464: 1187-1191 (2010)). Examples of RNA-binding proteins are translation initiation factors that bind with messenger RNA (mRNA), small nuclear ribonucleoproteins (snRNPs), and RNA editing proteins such as RNA specific adenosine deaminase. These RNA binding proteins perform such functions as regulating translation and RNA splicing and editing. Additionally, studies have shown that mutations such as single-nucleotide polymorphisms (SNPs), insertions, and deletions among either the recognition sequences or the genes which encode binding proteins can cause significant phenotypic changes. (See, e.g., Kasowski et al., Science, 328: 232-235 (2010); Zheng et al., Nature, 464: 1187-1191 (2010); and Grant et al., “Variant of transcription factor 7-like 2 (TCF7L2) gene confers risk of type 2 diabetes,” Nature Genetics, 38(3): 320-323 (2006)).
Previous assay methods to measure the binding affinity between a binding protein and its corresponding recognition site include chromatin immunoprecipitation with subsequent analysis by microarrays or sequencing, protein binding microarrays, and related techniques using surface plasmon resonance that followed early techniques to studying DNA-protein interactions such as DNA footprinting assays. (See, e.g., Galas and Schmitz, “DNAase footprinting: a simple method for the detection of protein-DNA binding specificity,” Nucleic Acids Research, 5(9): 3157-3170 (1978)). Approaches using chromatin immunoprecipitation with microarrays generally follow a protocol of fixing protein-nucleic acid complexes in vivo, such as with formaldehyde, lysing the cells, fragmenting the DNA, such as through sonication, immunoprecipitating the binding proteins of interest, extracting and purifying the associated nucleic acid fragments, and detecting these fragments with the array. (See, e.g., Horak and Snyder, “ChIP-chip: A Genomic Approach for Identifying Transcription Factor Binding Sites,” Methods in Enzymology, 350: 469-483 (2002)). Disadvantages of this technique include the requirement of specific antibodies for each binding protein of interest, and in addition to the added complexity and cost of such a requirement, there are binding proteins for which an antibody may not be available, or for which the conditions and time points enabling the antibody's expression and activity are unknown. (See, e.g., Mukherjee et al., “Rapid analysis of the DNA-binding specificities of transcription factors with DNA microarrays,” Nature Genetics, 36(12): 1331-1339 (2004)). Alternative chromatin immunoprecipitation techniques utilize subsequent sequencing in place of microarrays to identify the sequences that are bound by the binding proteins. (See, e.g., Robertson et al., “Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing,” Nature Methods, 4(8): 651-657 (2007)). However, specific antibodies must still be procured regardless of the change in the subsequent mode of analysis, and both techniques are also dependent upon the in vivo component of fixing binding protein-nucleic acid complexes, thus complicating the process of analyzing particular binding proteins and/or recognition sites of interest to a researcher or clinician.
Protein binding microarrays allow the entire assay to be performed in vitro, and require the production of a double-stranded nucleic acid array, such as a spotted double-stranded DNA array. (See, Mukherjee et al., Nature Genetics, 36(12): 1331-1339 (2004)). Binding proteins of interest are then introduced to the array with subsequent detection of the bound binding proteins. Such arrays, however, are limited to using the exact recognition sites within the double-stranded sequences spotted or otherwise produced upon the array. Mutations within recognition sequences such as SNPs, insertions, deletions, inversions, or a combination thereof, can drastically affect the binding affinity that a particular binding protein will have with that mutated recognition site. This can be especially important with regard to binding proteins which bind to multiple sequences, as these binding proteins will not be specific to only one recognition site, and additional changes to the recognition sequence of a possible recognition site through one or more mutations can substantially alter, among other process, regulation mechanisms employing competitive binding among multiple nucleic acid binding proteins. (See, e.g., Wang, “Finding Primary Targets of Transcriptional Regulators,” Cell Cycle, 4(3): 356-357 (2005); and Bulyk, “Protein Binding Microarrays for the Characterization of Protein-DNA Interactions,” Advances in Biochemical Engineering Biotechnology, 104: 65-85 (2007)). Furthermore, while related assays utilizing surface plasmon resonance can provide quantitative kinetic data, such assays are not easily scalable. (See, e.g., Bulyk, Advances in Biochemical Engineering Biotechnology, 104: 65-85 (2007); and Mukherjee et al., Nature Genetics, 36(12): 1331-1339 (2004)).
Additionally, some binding proteins operate in association with other molecules within their overall binding mechanism in various conditions. For example, transcription elongation factors GreA and GreB bind with and induce nucleolytic activity in RNA polymerase. (See, Laptenko et al., “Transcript cleavage factors GreA and GreB act as transient catalytic components of RNA polymerase,” The EMBO Journal, 22: 6322-6334 (2003)). Many DNA-binding proteins function in concert with cofactors as well, such as Mcm1 of the MADS box family of transcription factors, which bind with high specificity and affinity to their corresponding recognition sites but that require interaction with different cofactors such as α1 or Ste12. (See, Mead et al., “Interactions of the Mcm1 MADS Box Protein with Cofactors That Regulate Mating in Yeast,” Molecular and Cellular Biology, 22(13): 4607-4621 (2002)). Non-protein molecules may also affect the interaction of a binding protein with a recognition sequence, such as miRNAs or siRNAs and their affect on the binding affinities of RNA-binding proteins. (See, Jacobsen et al., “Signatures of RNA binding proteins globally coupled to effective microRNA target sites,” Genome Research, 20: 1010-1019 (2010)). Thus, mutations that affect the interaction between a binding protein and its accessory molecules, such as cofactor proteins or miRNAs, can directly affect binding affinities through, for instance, changes in certain residues which are crucial for proper interaction of a binding protein with its cofactors. (See, Mead et al., Molecular and Cellular Biology, 22(13): 4607-4621 (2002)).
Therefore, these previous methods fail to meet the ongoing need to personalize diagnostic and treatment options for individual patients in straightforward and cost-effective manner, and also fail to enable research of binding affinities of interest that accounts for possible mutations within a recognition sequence, including mutations that are rare and/or previously unknown. In addition to the continuing need for improved methods to measure the binding affinity of binding proteins for various recognition sites, there is also a need for improved methods to measure the differences effected in binding affinities when either the sequence of the gene encoding the binding protein, or the recognition sequence of the recognition site, or both, possess one or more mutations. As discussed above, mutations which affect the binding affinity of a binding protein can cause significant phenotypic changes. For example, the presence of SNPs can alter binding affinities sufficiently to cause corresponding differences in gene expression, thus effecting a functional genetic variation. (See, e.g., Kasowski et al., Science, 328: 232-235 (2010); Zheng et al., Nature, 464: 1187-1191 (2010); and Grant et al., Nature Genetics, 38(3): 320-323 (2006)). Assays to detect and measure these binding affinity changes are useful in diagnosing and treating conditions, such as SNPs within transcription factor 7-like 2 (TCF7L2) being correlated with an increased risk for type 2 diabetes. (See, e.g., Grant et al., Nature Genetics, 38(3): 320-323 (2006)). Likewise, mutations in the recognition sequence of binding proteins have also been shown to be associated with diseases and disorders, such as a SNP within the promoter of human coagulation factor VII leading to an inability of Specificity Protein 1 (Sp1) to bind, which results in a severe bleeding disorder. (See, Carew et al., “Severe Factor VII Deficiency Due to a Mutation Disrupting an Sp1 Binding Site in the Factor VII Promoter,” Blood, 92: 1639-1645 (1998)). Thus, in the continuing quest to personalize medical diagnostics and therapies to the specific individual being treated, there is a need for improved methods to measure the binding affinities of binding proteins based upon the individual's personal genome so that diagnoses and therapies can be adjusted accordingly. Analysis of an individual's particular binding affinities between various binding proteins and their relevant recognition sites can further explain the genetic contribution to a variety of medical conditions when knowledge of the mutation alone is insufficient to determine and implement a therapy that is personalized to the individual.