The field of this invention is molecular biology, particularly protein/nucleic acid binding interactions and protocols for the identification thereof.
Identification of protein-nucleic acid interactions is paramount in understanding the underlying molecular mechanisms in cellular processes such as replication, transcription, and signaling. One important component in the characterization of DNA/RNA binding proteins is the analysis of sequence specific interactions using xe2x80x9cfootprintingxe2x80x9d techniques, in which the sequence of the protein binding domain of a nucleic acid is identified.
One footprinting protocol that finds use is based on ligation mediated polymerase chain reaction (LMPCR) (Mueller, P. R. and Wold, B. (1989) Science 246: 780-786). Reagents that are commonly employed in this protocol include DNasel, DMS (dimethylsulfate) and UV light. In these footprinting protocols, a given nucleic acid, typically of known sequence, is screened for the presence of protein binding sequences by contacting the nucleic acid with one or more test nucleic acid binding proteins. Specific sequences along the nucleic acid that are bound to the protein(s) are protected from nucleophilic attack or cross-linking by the reagents, thus creating a xe2x80x9cfootprintxe2x80x9d across this region(s) in the nucleic acid. The protected region is then identified by first cleaving the DNA at the lesion, and annealing a gene specific primer to the region of interest. This primer is extended using a processive DNA Polymerase to the cleavage site, creating a blunt end. A unidirectional linker (staggered) is then attached to the blunt ended molecule using DNA ligase. The 3xe2x80x2 end of the longer strand of the linker is ligated to the 5xe2x80x2 end of the genomic DNA. The shorter strand of the linker lacks a 5xe2x80x2 phosphate and therefore is not ligated to the extension product. A second gene specific primer and a linker specific primer are annealed to this product, which is now a suitable substrate for a PCR reaction. Only molecules that have both sequences (primer 2 sequence and linker sequence) are amplified. A third gene specific primer (labeled) is then used to sequence the products that can subsequently be visualized on a sequencing gel. In this manner, the protein binding sequence of the nucleic acid is identified.
Terminal Transferase dependent PCR (TDPCR) is a modified LMPCR methodology that has been devised for studying protein-RNA interactions (Tornaletti, S, and Pfeifer, G (1995) J. Mol. Biol. 249: 714-728; Chen, Hxe2x80x94H, et al. (2000) Nucl. Acid Res. 28: 1656-1664). It uses UV light as the primary source of creating appropriate lesions (intra-strand pyrimidine dimer formation, primarily between thymidines) within the RNA, which inhibit progression of DNA polymerases.
Although LMPCR and TDPCR are very powerful techniques in mapping protein-nucleic acid interaction or binding sites, they suffer from several disadvantages that are summarized below. First, in studying protein-nucleic acid interactions using LMPCR/TDPCR, one needs to have prior knowledge of the gene sequence (or transcript) in question in order to be able to design appropriate gene specific primers for amplification. Second, the LMPCR/TDPCR protocols are labor intensive and offer considerable challenges to those not well versed in the art. Third, both LMPCR and TDPCR allow analysis of protein-nucleic acid interactions at the nucleotide resolution by revealing the footprint that the protein leaves behind on the nucleic acid. However, they are not useful techniques in determining the underlying identity of the protein(s) resulting in such a footprint. To identify the proteins per se, one has to resort to the use of monoclonal antibody protocols, which suffer from the drawback that a priori knowledge about the identity of the proteins is needed. Because of the above limitations, none of the currently employed techniques for identifying protein/nucleic acid binding pairs can be adopted for high throughput mapping of site-specific protein binding sequences.
As such, there is a continued interest in the development of new protocols for identifying protein/nucleic acid binding pairs, where the development of a protocol that could be adapted to a high throughput format is of particular interest.
Relevant Literature
U.S. Patents of interest include: U.S. Pat. Nos. 5,925,517; 6,150,097; 6,355,421. Also of interest is: Tyagi and Kramer, Nat Biotechnol (1996 Mar) 14(3): 303-8.
Methods and compositions for identifying protein/nucleic acid binding pairs are provided. In the subject methods, a nucleic acid probe array is first contacted with a target nucleic acid population to produce a hybridized array. The resultant hybridized array is then contacted with a population of proteins to produce a protein bound array. Protein/nucleic acid binding pairs are then detected on the array surface. In certain embodiments, the protein and/or nucleic acid members of the identified protein/nucleic acid binding pairs are further characterized.
In many embodiments, the array employed is a molecular beacon array having a plurality of distinct molecular beacon probes all labeled with the same first fluorescent label. In these embodiments, the molecular beacon array is first contacted with a target nucleic acid population to produce a hybridized array. The resultant hybridized array is then contacted with a population of proteins all labeled with the same second fluorescent label to produce a protein bound array. A feature of the methods of this embodiment is that the first and second fluorescent labels make up a FRET pair. Any FRET generated signals from the resultant protein bound array are then detected from the protein bound array to identify protein/nucleic acid binding pairs.
Also provided are systems and kits for use in practicing the subject methods. The subject invention finds use in a variety of different applications.