This invention relates generally to post-transcriptional regulation and methods of profiling gene expression.
Many diseases are genetically based, and the genetic background of each individual can have a profound effect on his or her susceptibility to disease. The relatively new field of functional genomics has provided researchers with the ability to determine the functions of proteins based upon knowledge of the genes that encode the proteins. A major goal of functional genomics is to identify gene products that are suitable targets for drug discovery. Such knowledge can lead to a basis for target validation if it is demonstrated that the target of interest has an essential function in a disease. Accordingly, a need exists to develop methods that allow profiling of the gene expression state of cells and tissues in order to understand the consequences of genetics on growth and development.
Understanding global gene expression at the level of the whole cell requires detailed knowledge of the contributions of transcription, pre-mRNA processing, mRNA turnover and translation. Although the sum total of these regulatory processes in each cell accounts for its unique expression profile, few methods are available to independently assess each process en masse.
The expression state of genes in a complex tissue or tumor is generally determined by extracting messenger RNAs from samples (e.g., whole tissues) and analyzing the expressed genes using cDNA libraries, microarrays or serial analysis of gene expression (SAGE) methodologies. See, e.g., Duggan, et al., (1999) Nature Genetics 21, 10-14.; Gerhold, et al., (1999) Trends in Biochemical Sciences 24, 168-173; Brown, et al., (1999) Nature Genetics 21, 38-41; Velculescu, et al., (1995) Science 270, 484-487 Velculescu, et al. (1997) Cell 88, 243-251. In order to determine the gene expression profile of any single cell type within a tissue or tumor or to recover those messenger RNAs, the tissue must first be subjected to microdissection. This is very laborious, as only a small amount of cellular material is recovered and the purity as well as the quality of the cellular material is compromised.
Post-transcriptional events influence the outcome of protein expression as significantly as transcriptional events. The regulation of transcription and post-transcription are generally linked. Altering the expression of transcriptional activators or repressors has important consequences for the development of a cell. Therefore, feedback loops following translational activation of specific mRNAs may change the program of transcription in response to growth or differentiation signals. DNA arrays are well-suited for profiling the steady-state levels of mRNA globally (i.e., total mRNA or the xe2x80x9ctranscriptomexe2x80x9d). However, because of post-transcriptional events affecting mRNA stability and translation, the expression levels of many cellular proteins do not directly correlate with steady-state levels of mRNAs (Gygi et al. (1999) Mol. Cell Biol. 19, 1720-1730; Futcher et al. (1999) Mol. Cell Biol. 19, 7357-7368).
Many mRNAs contain sequences that regulate their post-transcriptional expression and localization (Richter (1996) in Translational Control, eds. J. W. B Hershey, et al., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, pp. 481-504). These regulatory elements reside in both introns and exons of pre-mRNAs, as well as in both coding and noncoding regions of mature transcripts (Jacobson and Peltz (1996) Annu. Rev. Biochem. 65, 693-739; Wickens et al. (1997) Curr. Opin. Genet. Dev. 7, 220-232). One example of a sequence-specific regulatory motif is the AU-rich instability element (ARE) present in the 3xe2x80x2-untranslated regions (UTRs) of early-response gene (ERG) mRNAs, many of which encode proteins essential for growth and differentiation (Caput et al. (1986) Proc. Natl. Acad. Sci. USA 83, 1670-1674; Shaw and Kamen (1986) Cell 46, 659-667; Schiavi et al. (1992) Biochim. Biophys. Acta 1114, 95-106; Chen and Shyu (1995) Trends Biochem. Sci. 20, 465-470). Regulation via the ARE is poorly understood, but the mammalian ELAV/Hu proteins have been shown to bind to ARE sequence elements in vitro and to affect post-transcriptional mRNA stability and translation in vivo (Jain et al. (1997) Mol. Cell Biol. 17, 954-962; Levy et al. (1998) J. Biol. Chem. 273, 6417-6423; Fan and Steitz (1998) EMBO J. 17, 3448-3460; Peng et al. (1998) EMBO J. 17, 3461-3470; Keene (1999) Proc. Natl. Acad. Sci. USA 96, 5-7).
In vitro RNA selection methods based upon cellular sequences are reported in Gao et al., Proc. Natl. Acad. Sci USA 90, 11207-11211 (1994) and U.S. Pat. Nos. 5,773,246, 5,525,495 and 5,444,149, all to Keene et al., the disclosures of which are incorporated herein in their entirety. Generally, these methods were intended to identify large numbers of mRNAs present in messenger RNP (mRNP) complexes, and utilized in vitro binding and amplification of mRNA sequences from large pools of naturally-occurring mRNAs. These studies used proteins (referred to as ELAV or Hu proteins) known to bind to AU-rich sequence elements present in the untranslated regions of cellular mRNAs. These experiments led to the discovery that mRNAs which are structurally or functionally related may be revealed using multi-targeted RNA binding proteins (i.e., RNA binding proteins that specifically bind more than one target). See Levine, et al., (1994) et al., Molecular and Cellular Biology 13, 3494-3504; and King, et al., (1993) Journal of Neuroscience 14, 1943-1952; reviewed in Antic and Keene (1997) American Journal of Human Genetics 61, 273-278 and Keene (1999) Proceedings of the National Academy of Sciences (USA) 96, 5-7. However, these reports are limited to in vitro applications, and do not describe in vivo methods for partitioning RNA into structural or functional subsets using RNA binding proteins. Although in vitro methods have been used to determine protein-RNA interactions, their use has certain limitations. Biochemical methods are generally reliable when carefully controlled, but RNA-binding can be problematic because many interactions may be of low affinity, low specificity or even artifactual. In order to understand RNA-protein interactions and their functional implications on a global systems level it is necessary to find reliable methods to monitor messenger RNP complexes in vivo.
The successful immunoprecipitation of epitope-tagged ELAV/Hu protein which has been transfected into pre-neuronal cells has been reported. See Antic et al., Genes and Development 13, 449-461 (1999). This immunoprecipitation was followed by nucleic acid amplification that allowed for the identification of a messenger RNA encoding neurofilament M protein (NF-M).
The present invention relates to a new, in vivo approach for the determination of gene expression that utilizes the flow of genetic information through messenger RNA clusters or subsets. Recently, the practice of examining multiple macromolecular events simultaneously and in parallel with the goal of organizing such information computationally has taken the designation xe2x80x9c-ome.xe2x80x9d Thus, the genome identifies all of the genes of a cell, while the transcriptome is defined as the messenger RNA complement of the genome and the proteome is defined as the protein complement of the genome (see FIG. 1). The present inventors have defined several physically organized subsets of the transcriptome and defined them as dynamic units of the xe2x80x9cribonomexe2x80x9d. As described herein, the ribonome consists of a plurality of distinct subsets of messenger RNAs (mRNAs) that are clustered in the cell due to their association with RNA-binding proteins (e.g., regulatory RNA-binding proteins). By identifying the mRNA components of a cellular ribonome, the cellular transcriptome can be broken down into a series of subprofiles that together can be used to define the gene expression state of a cell or tissue (see FIG. 2). In combination with, for example, high throughput approaches and by multiplexing RNA processing assays, the present inventive methods provide the ability to determine the changes that occur in multiple gene transcripts simultaneously.
Accordingly, one aspect of the invention is an in vivo method of partitioning endogenous, cellular mRNA-binding protein (mRNP) complexes. The method, in one embodiment, comprises contacting a biological sample that comprises at least one mRNP complex with a ligand that specifically binds a component of the mRNP complex. The biological sample may be, for example, a tissue sample, whole tissue, a whole organ, a cell culture, or a cell extract or lysate. The component of the mRNP complex may be a RNA binding protein, a RNA-associated protein, a nucleic acid associated with the mRNP complex including the mRNA itself, or another molecule or compound (e.g., carbohydrate, lipid, vitamin, etc.) that associates with the mRNP complex. The ligand may be, for example, an antibody that specifically binds the component, a nucleic acid that binds the component (e.g., an antisense molecule, a RNA molecule that binds the component), or any other compound or molecule that binds the component of the complex. The mRNP complex is then separated by binding the ligand (now bound to the mRNP complex) to a binding molecule that binds the ligand. The binding molecule may bind the ligand directly (i.e., may be an antibody specific for the ligand), or may bind the ligand indirectly (i.e., may be an antibody or binding partner for a tag on the ligand). The binding molecule will be attached to a solid support, such as a bead or plate or column, as known in the art. Accordingly, the mRNP complex will be attached to the solid support via the ligand and binding molecule. The mRNP complex is then collected by removing it from the solid support (i.e., the complex is washed off the solid support using suitable conditions and solvents).
The identity of the mRNA bound within the mRNP complex may then be determined, for example, by separating the mRNA from the complex, reverse transcribing the mRNA into cDNA, and sequencing the cDNA.
In embodiments of the invention, therefore, the mRNP complex may be isolated by direct immunoprecipitation of the mRNP complexes, either with or without epitope tags, or by other biochemical partitioning methods. For example, other proteins bound to or associated with the mRNP complex may be immunoprecipitated in order to recover the mRNP complex and subsequently the mRNAs bound within the complex. The skilled artisan will appreciate that embodiments of the inventive method allow for the identification of a plurality of mRNA complexes simultaneously (i.e., concurrently), sequentially, or in batch-wise fashion. Alternatively, the method may be carried out on one biological sample (or portion thereof) numerous times, the steps of the method being performed in a sequential fashion, with each iteration of the method utilizing a different ligand. In any of the described embodiments, cDNA or genomic microarray grids, for example, may be used to identify mRNAs isolated by the inventive method en masse.
A xe2x80x9csubsetxe2x80x9d of mRNA is defined as a plurality of mRNA transcripts or messages that specifically bind or associate with a mRNP complex. In other words, subsets are defined by their ability to bind within or to a particular mRNP complex. The subset will preferably be a quantitative or qualitative fraction of the total mRNA population of the cell. Furthermore, subsets within subsets of mRNAs may be identified using the invention. The collection of mRNA subsets for any particular cell or tissue sample is an expression profile, also referred to herein as a xe2x80x9cribonomic profile,xe2x80x9d for that cell or tissue. It will be appreciated that expression profiles will differ from cell sample to cell sample, depending on the type of cell in the sample (e.g., what species or tissue type the cell is), the differentiation status of the cell, the pathogenicity of the cell (i.e., if the cell is infected or if it is expressing a deleterious gene, such as an oncogene, or if the cell is lacking a particular gene), the specific ligand used to isolate the mRNP complex, etc. Thus, the expression profile of a cell may be used as an identifier for the cell, enabling the artisan to compare and distinguish profiles of different cells.
Stated otherwise, the ribonomic profile provides a pattern recognition subset of the global mRNA profile of the cell. When the growth state of the cells changes (i.e., tumorigenesis) or the cell is perturbed by a pathogen (i.e., a viral infection), the profile will change, and a perturbation of the ribonome can be detected. If cells are treated with compounds (i.e., drugs) the ribonomic patterns will show desirable or undesirable alteration. Accordingly, the new method provides methods for evaluating the effect of numerous factors on a cell, including toxicity, aging, apoptosis, pathogenesis and cell differentiation.
The new invention has several advantages over previous methods of partitioning RNA. First, partitioning of mRNP complexes may be carried out in vivo, while previous methods were limited to in vitro applications. The new method is robust enough such that amplification (e.g., by PCR, or alternatively according to the method of Antic et al. (1999) Genes Dev. 13, 449-461) is not necessary to identify cDNAs of interest once they are reverse transcribed from the isolated subset of mRNAs. The present invention does not require the use of iterative processes, such as those set forth in Gao et al. supra. Finally, quantitative determinations are possible with the present invention if, for example, hybridization is used to analyze the expression profile of the cell (e.g., in microarray assays or RNAse protection assays (RPA)).
In certain embodiments, therefore, the present invention advantageously allows the artisan to identify, monitor, and quantitate mature gene transcripts en masse in order to determine their localization, activity, stability, and translation into protein components of living cells. The methods described herein advantageously provide a novel approach to functional genomics by providing methods of isolating endogenous messenger-RNA binding proteins, and methods of identifying the subset of cellular mRNAs contained in mRNP-complexes, using microarrays or other known procedures. In preferred embodiments, the inventive method provides a basis for investigating and determining functional mRNA networks during growth and differentiation cycles by using mRNA-binding proteins and other mRNP-associated factors to define mRNA subsets.
It will be appreciated that patterns of mRNA subsets (i.e., expression profiles) may be altered in the presence of certain compounds (i.e., drugs) or under various disease conditions. Accordingly, in certain embodiments the inventive methods are useful for screening compounds that may be of therapeutic use, and for finding appropriate gene targets for the compounds. In other embodiments, the inventive method is useful for determining the disease state of a cell, thus providing means for classifying or diagnosing the presence or predisposition for disease (e.g., cancer).
Gene expression profiles will also vary between differing cell types present in a complex tissue, such as a tumor. Some mRNA binding proteins are present only in certain tumor cells, and a tumor may comprise more than one cell type. Gene expression profiling for each cell type within a tumor or tissue may be carried out by making an extract of the tissue and immunoprecipitating cell-type specific components of mRNP complexes (e.g., RNA-binding proteins that are attached to mRNA) directly from the extract (i.e., in vivo). The immunoprecipitated pellets will contain mRNAs that are only present in the same cells that contain the attached or associated component. Thus, in certain embodiments, the inventive methods may be used to characterize and distinguish the gene expression profiles of a plurality of cell types, which cell types may co-exist in the same complex tissue. This can allow the tumor cells to be profiled in whole tumor extracts without having to analyze mRNA in, for example, the non-tumor stromal cells and blood cells that surround tumor cells. The results of such characterization may be useful in determining, for example, the proper course of treatment for a patient suffering with a tumor, when the choice of treatment depends of the kind of tissue (e.g., endothelial vascular tissue) present in a tumor.
In another embodiment, the present invention provides methods for isolating and optionally identifying proteins that bind or associate with a mRNP complex.
Alternatively, and in another embodiment, the inventive method may be used to screen test compounds for their ability to modulate gene expression in a cell. Such methods are useful for screening putative drugs that may be used in the treatment and/or prevention of disorders associated with irregularities in gene expression, including but not limited to cancer.
The foregoing and other aspects of the present invention are explained in detail in the specification set forth below.