How eukaryotic organisms regulate gene expression is a fundamental question in biology. It is increasingly clear that regulation of mRNA levels and their translation to corresponding proteins reflects highly coordinated, multi-layered processes. Central to all of these post-transcriptional processes, from the initial processing of a transcript in the nucleus to its final translation and decay in the cytoplasm, are a multitude of complex interactions of each transcript with numerous trans-acting RNA binding proteins.
RNA-binding proteins interact with mRNAs to form dynamic multi-component ribonucleoprotein complexes (RNPs). These RNPs constitute the functional forms of mRNAs, and it is only through their proper formation that transcripts are correctly regulated to precisely produce the required amount of protein in a cell. The importance of RNPs to human biology, in particular, has been demonstrated by studies that show mutations disrupting the assembly of RNPs, as well as specific RNP-contained RNAs or proteins can be deleterious to cells, leading to human diseases. For example, defects in the RNA-binding proteins TAR DNA-binding protein 43 (TDP-43) and Fused in sarcoma/Translocated in liposarcoma (FUS/TLS) have been identified in Amyotrophic Lateral Sclerosis (ALS) patients, as have mutations in FMR1 in those affected with Fragile X syndrome, and Tert in Dyskeratosis congenita patients.
The lack of a comprehensive picture of RNP structures in the human transcriptome is not due to a lack of effort, since numerous biochemical, genetic, and computational approaches have been utilized, alone and in combination, to identify and validate RBP interaction sites (RISs) and their interacting RBPs. What has substantially hindered progress in this field is the fact that RIS structures are often short, highly degenerate, and dependent on secondary structures, making definitive mapping difficult by conventional means. Biochemical techniques to identify precise in vivo occupancy of these sequences remains laborious, and the fact that most of the hundreds of predicted RNA binding proteins have no known function, makes it challenging to determine the true prevalence and regulatory importance of most RNA-protein interactions (Lebedeva et al., 2011; Mukheree et al., 2011, which are incorporated by reference as if fully set forth).
Currently utilized methods for identifying RNA-protein interactions involve purification of a specific RNA-binding protein followed by interrogation of RNAs bound thereto using subsequent sequencing or microarray hybridization. These currently employed approaches include RNA immunopurification followed by microarray hybridization (RIP-chip), high-throughput sequencing of RNAs isolated by crosslinking immunoprecipitation followed by high-throughput sequencing (PAR-CLIP), etc. (Lebedeva et al., 2011; Mukherjee et al., 2011, which are incorporated by reference as if fully set forth). Although these approaches have proven useful in characterizing a few RNPs (for examples see Lebedeva et al., 2011; Mukherjee et al., 2011, which are incorporated by reference as if fully set forth), several drawbacks limit their applicability to global studies of RBP-RNA interactions. For instance, the protein of interest must first be identified and have previously characterized ability to bind RNA. Additionally, an antibody targeted to this single protein of interest must be used to purify the RBP and its associated RNAs. Third, because these methods rely on prior characterization of a known protein, they cannot be used with as of yet unidentified alternative RNA binding domains.
The above observations point to a major gap between the significant importance of RNA-RBP interactions in the cell and the difficulty in establishing a comprehensive accounting of these interactions.