Gene expression reflects many important aspects of cellular physiology, including changes during development as well as disease states. Microarrays provide quantitative, genome-wide measurements of expression by monitoring mRNA abundance. Deep sequencing has recently emerged as an alternative to microarrays that promises some advantages in characterizing and quantifying the full pool of cellular mRNA. However, mRNA abundance is an imperfect proxy for protein production, which is the ultimate molecular expression of a protein-coding gene. Quantifying the translation of mRNA into protein is thus of very general interest in biology. For instance, microRNAs can repress target genes translationally, and so their direct effects may only be visible to measures of translation as opposed to mRNA abundance. Translational regulation also plays an important role in development and in learning and memory. Measuring translation, especially on a genome-wide scale, has proven to be more technically challenging than measuring mRNA abundance. Typically, transcripts are fractionated based on ribosome occupancy, and different fractions are then analyzed by microarray to determine the translational status of different messages. However, this approach requires the analysis of many fractions in parallel, and even then achieves only limited quantitative resolution. Furthermore, polysome fractionation gives no information about the position of the ribosome on the mRNA. While conceptual translation typically identifies the correct protein-coding sequence, there are exceptions such as programmed ribosomal frameshifting. Upstream open reading frames (uORFs), short translated sequences in the 5′ UTR of many genes, pose a particularly prominent difficulty. There are a few well-studied instances where these uORFs are clearly translated, often with consequences for the translation of the downstream protein-coding gene, and many more are highly conserved. However, it is challenging to directly demonstrate uORF translation, and polysome profiling cannot distinguish whether ribosomes are occupying the uORF or the CDS on a transcript.
Translating ribosomes occupy a discrete footprint on their mRNA template. Steitz first demonstrated the ribosomal footprint in vitro using nuclease digestion to remove unprotected mRNA, leaving behind a ribosome-protected fragment. However, the technology available to characterize these RNA fragments has been quite limited. The accumulation of ribosome footprints derived from a specific position in an mRNA can reveal ribosomal pausing during in vitro eukaryotic translation. However, until now techniques have not been available to quantify translation by combining the historical observation of ribosome footprinting with new advances in deep sequencing.
Embodiments of the present disclosure are based, at least in part, on the surprising observation that capturing and characterizing the footprints from in vivo ribosomes can reveal the full translational profile of the cell. The eukaryotic ribosome protects roughly 30 nucleotides of mRNA from digestion, a length which corresponds well to the sequence reads of the highest-capacity deep sequencing platforms. The sequence of a ribosome footprint identifies its source, and thus the position of one ribosome, and deep sequencing can analyze tens of millions of reads in parallel. In a particular embodiment, quantitative and highly reproducible measurements of translation can be obtained for budding yeast by counting ribosome footprint sequences. Because ribosome footprints show the exact location of the ribosome, not just which mRNA it is translating, variations in ribosome occupancy within genes can be determined as can the presence of ribosomes on upstream open reading frames (uORFs) as opposed to coding sequences.