Identification of protein-nucleic acid interactions is paramount in understanding the underlying molecular mechanisms in cellular processes such as replication, transcription, and signaling. One important component in the characterization of DNA/RNA binding proteins is the analysis of sequence specific interactions using “footprinting” techniques, in which the sequence of the protein binding domain of a nucleic acid is identified.
One footprinting protocol that finds use is based on ligation mediated polymerase chain reaction (LMPCR) (Mueller, P. R and Wold, B. (1989) Science 246: 780–786). Reagents that are commonly employed in this protocol include DNasel, DMS (dimethylsulfate) and UV light. In these footprinting protocols, a given nucleic acid, typically of known sequence, is screened for the presence of protein binding sequences by contacting the nucleic acid with one or more test nucleic acid binding proteins. Specific sequences along the nucleic acid that are bound to the protein(s) are protected from nucleophilic attack or cross-linking by the reagents, thus creating a “footprint” across this region(s) in the nucleic acid. The protected region is then identified by first cleaving the DNA at the lesion, and annealing a gene specific primer to the region of interest. This primer is extended using a processive DNA Polymerase to the cleavage site, creating a blunt end. A unidirectional linker (staggered) is then attached to the blunt ended molecule using DNA ligase. The 3′ end of the longer strand of the linker is ligated to the 5′ end of the genomic DNA. The shorter strand of the linker lacks a 5′ phosphate and therefore is not ligated to the extension product. A second gene specific primer and a linker specific primer are annealed to this product, which is now a suitable substrate for a PCR reaction. Only molecules that have both sequences (primer 2 sequence and linker sequence) are amplified. A third gene specific primer (labeled) is then used to sequence the products that can subsequently be visualized on a sequencing gel. In this manner, the protein binding sequence of the nucleic acid is identified.
Terminal Transferase dependent PCR (TDPCR) is a modified LMPCR methodology that has been devised for studying protein-RNA interactions (Tornaletti, S, and Pfeifer, G (1995) J. Mol. Biol. 249: 714–728; Chen, H—H, et al. (2000) Nucl. Acid Res. 28: 1656–1664). It uses UV light as the primary source of creating appropriate lesions (intra-strand pyrimidine dimer formation, primarily between thymidines) within the RNA, which inhibit progression of DNA polymerases.
Although LMPCR and TDPCR are very powerful techniques in mapping protein-nucleic acid interaction or binding sites, they suffer from several disadvantages that are summarized below. First, in studying protein-nucleic acid interactions using LMPCR/TDPCR, one needs to have prior knowledge of the gene sequence (or transcript) in question in order to be able to design appropriate gene specific primers for amplification. Second, the LMPCR/TDPCR protocols are labor intensive and offer considerable challenges to those not well. versed in the art. Third, both LMPCR and TDPCR allow analysis of protein-nucleic acid interactions at the nucleotide resolution by revealing the footprint that the protein leaves behind on the nucleic acid. However, they are not useful techniques in determining the underlying identity of the protein(s) resulting in such a footprint. To identify the proteins per se, one has to resort to the use of monoclonal antibody protocols, which suffer from the drawback that a priori knowledge about the identity of the proteins is needed. Because of the above limitations, none of the currently employed techniques for identifying protein/nucleic acid binding pairs can be adopted for high throughput mapping of site-specific protein binding sequences.
As such, there is a continued interest in the development of new protocols for identifying protein/nucleic acid binding pairs, where the development of a protocol that could be adapted to a high throughput format is of particular interest.
Relevant Literature
U.S. Patents of interest include: U.S. Pat. Nos. 5,925,517; 6,150,097; 6,355,421. Also of interest is: Tyagi & Kramer, Nat Biotechnol (March 1996) 14(3): 303–8.