The present disclosure relates to the fields of pharmacogenomics, signal transduction, bioinformatics, gene regulation, gene regulatory sequences, gene regulatory proteins and methods of determining gene regulatory pathways.
Worldwide genome sequencing efforts are providing a wealth of information on the sequence and structure of various genomes, and on the locations of thousands of genes. In addition, genome research is yielding a considerable amount of information on gene products and their functions. The next challenges will be in the understanding and interpretation of genomic information. A major limitation in the analysis of genome sequence information to date is the lack of information that has been extracted from genome sequences on the location, extent, nature and function of sequences that regulate gene expression, i.e., gene regulatory sequences.
The cis-acting sequence elements that participate in the regulation of a single metazoan gene can be distributed over 100 kilobase pairs or more. Combinatorial utilization of regulatory elements allows considerable flexibility in the timing, extent and location of gene expression. The separation of regulatory elements by large linear distances of DNA sequence facilitates separation of functions, allowing each element to act individually or in combination with other regulatory elements. Non-contiguous regulatory elements can act in concert by, for example, looping out of intervening chromatin, to bring them into proximity, or by recruitment of enzymatic complexes that translocate along chromatin from one element to another. Determining the sequence content of these cis-acting regulatory elements offers tremendous insight into the nature and actions of the trans-acting factors which control gene expression, but is made difficult by the large distances by which they are separated from each other and from the genes which they regulate.
In order to address the problems associated with collecting, processing and analyzing the vast amounts of sequence data being generated by, e.g., genome sequencing projects, various bioinformatic techniques have been developed. In general, bioinformatics refers to the systematic development and application of information technologies and data processing techniques for collecting, searching, analyzing and displaying data obtained by experiments to make observations concerning biological processes.
One example of such an analysis involves the determination of sequences corresponding to expressed genes (expressed sequence tags, or ESTs) and computerized analysis of a genome sequences by comparison to databases of expressed sequence tags. However, this type of analysis provides information on coding regions only and thus does not assist in the identification of regulatory sequences. Mapping of a particular EST onto a genome sequence and searching the region upstream of the EST for potential regulatory sequences is also ineffective, for several reasons. First, large introns and/or 5xe2x80x2 untranslated regions can separate an EST sequence from its upstream regulatory regions; therefore the genomic region to be searched for regulatory sequences is not clearly defined. Second, searches of a given region of a genome for sequences homologous to transcription factor binding sites will yield numerous xe2x80x9chitsxe2x80x9d (representing potential regulatory sequences), some of which are functional in a given cell and some of which are not. Thus, such searches will fail to provide unambiguous information as to which of several potential regulatory sequences are active in the regulation of expression of a given gene in a particular cell. Furthermore, it is likely that, with respect to a particular gene, different regulatory regions are functional in different cell types. Therefore, the problem of identifying regulatory sequences for a gene is specific to each cell type in which the gene is (or is not) expressed. Indeed, different regulatory sequences will often be responsible for regulating the expression of a particular gene in different cells.
Thus, the informational content of a gene does not depend solely on its coding sequence (a portion of which is represented in an EST), but also on cis-acting regulatory elements, present both within and flanking the coding sequences. These include promoters, enhancers, silencers, locus control regions, boundary elements and matrix attachment regions, all of which contribute to the quantitative level of expression, as well as the tissue- and developmental-specificity of expression of a gene. Furthermore, the aforementioned regulatory elements can also influence selection of transcription start sites, splice sites and termination sites.
Identification of cis-acting regulatory elements has traditionally been carried out by identifying a gene of interest, then conducting an analysis of the gene and its flanking sequences. Typically, one obtains a clone of the gene and its flanking regions, and performs assays for production of a gene product (either the natural product or the product of a reporter gene whose expression is presumably under the control of the regulatory sequences of the gene of interest). Here again, one encounters the problem that the extent of sequences to be analyzed for regulatory content is not concretely defined, since sequences involved in the regulation of metazoan genes can occupy up to 100 kb of DNA. Furthermore, assays for gene products are often tedious and reporter gene assays are often unable to distinguish transcriptional from translations regulation and can therefore be misleading.
Pelling et al. (2000) Genome Res. 10:874-886 disclose a library of transcriptionally active sequences, derived by cloning chromosomal sequences that are immunoprecipitated by antibodies to hyperacetylated histone H4. This library comprises primarily coding sequences and sequences proximal to the transcription startsite. It does not disclose methods for identifying regulatory sequences, databases of regulatory sequences or uses for databases of regulatory sequences.
It can thus be seen that a major limitation of current comparative genomics and bioinformatic analyses is that they are unable to identify cell-specific regulatory sequences. In light of these limitations, methods for identifying regulatory DNA sequences (particularly in a high-throughput fashion), libraries of regulatory sequences, and databases of regulatory sequences would considerably advance the fields of genomics and bioinformatics.
Disclosed herein are compositions and methods useful in a wide variety of applications, including, but not limited to, (1) identifying a drug that affects accessible regions of cellular chromatin; (2) elucidating signal transduction pathways; (3) faciliating modulation of signal transduction pathways and/or faciliating modulation of the gene(s) associated with the signal transduction pathway; and (4) pharmacogenomically selecting an appropriate drug therapy. Thus, these compositions and methods are useful for facilitating drug discovery and testing.
Accordingly, in one aspect, provided herein are methods for identifying a drug that affects accessible regions of cellular chromatin relative to a gene of interest, the method comprising: (a) providing cellular chromatin having known accessible regions; (b) exposing the cellular chromatin of step (a) to the drug under conditions that allow the drug to affect cellular chromatin; and (c) comparing the nature of accessible regions of the cellular chromatin of step (b) with the accessible regions of the cellular chromatin of step (a) to determine the effect, if any, of the drug on the accessible regions of the cellular chromatin. In certain embodiments, step (c) comprises reacting the cellular chromatin with an antibody against acetylated or phosphorylated histones. In other embodiments, step (c) comprises mapping accessible regions of cellular chromatin, for example by (a) reacting cellular chromatin with a chemical or an enzymatic probe to generate chromatin-associated DNA fragments, wherein the DNA fragments comprise, at their termini, sites of probe reaction and identify accessible regions of cellular chromatin; (b) optionally treating the DNA fragments to obtain blunt ends; (c) ligating an adapter polynucleotide to at least one end of the DNA fragments to generate adapter-ligated fragments; (d) amplifying the adapter-ligated fragments in the presence of a first primer that is complementary to the adapter and a second primer that is complementary to a segment of the gene of interest to form an amplified product, wherein the size of the amplified product is a measure of the distance from the segment of the gene to which the second primer binds and the site of probe reaction. The probe can be, for example, an enzymatic probe (e.g., nuclease such as DNAseI or micrococcal nuclease) or a chemical probe. The drug can be, for example, a protein or a small molecule. In certain embodiments, the accessible regions are provided as a collection of polynucleotides (e.g., a library).
In other aspects, methods for identifying a drug target are provided. The methods comprise (a) identifying an accessible region in cellular chromatin related to a gene of interest; (b) examining the sequence of the accessible region to identify one or more binding sites for a transcriptional regulatory molecule; and (c) identifying one or more members of a signal transduction pathway that regulate the activity of the transcriptional regulatory molecule; wherein the member of the signal transduction pathway identified in step (c) comprises the drug target. The accessible region can be identified by any method, for example, by DNAse hypersensitivity. The signal transduction pathway can include, for example, a G-protein-coupled receptor. The cellular chromatin may be derived from any suitable cell, for example, a human cell, an animal cell, a plant cell or a microorganism (e.g., a human, animal or plant pathogen) and may be provided, for example, as a collection of polynucleotides representing accessible regions. The drug target can comprise a nucleotide sequence or a protein or amino acid sequence.
In still further embodiments, the methods comprise adding a molecule (e.g., a protein or small molecule) that affects the expression of the gene via affecting the member of the signal transduction pathway identified by the methods described herein. The gene may be involved in a disease state or condition, for example cancer, osteoporosis or cardiovascular disease. In some embodiments, the molecule can augment expression of the gene, for example to increase production of a protein product; increasing expression of genes involved in cell growth; or increasing expression of a gene involved in resistance or clearing pathogens in a host (e.g., animal, human or plant host). In yet other embodiments, the molecule inhibits expression of the gene, for example inhibiting a gene involved in cell growth; inhibiting genes expressed by diseased (e.g., cancerous) cells; or inhibiting a gene in a pathogenic organism involved in toxicity, infectivity or pathogenicity.
Also provided herein are methods for modulating a signal transduction pathway involved in regulation of a gene of interest. The methods comprise (a) identifying a member of the signal transduction pathway involved in regulation of a gene by the methods described herein; and (b) adding a molecule which modulates the activity of the member of the signal transduction pathway identified in step (a). In certain embodiments, the modulation of the selected member of the signal transduction pathway facilities processes selected from the group consisting of tissue engineering, transplantation, response to a pathogen, inhibition of cell growth, activation of cell growth and apoptosis. In some embodiments, the signal transduction pathway is present in human or animal cells, and is involved, for example, in growth, production of chemicals or biochemicals, response to pathogens, disease states or conditions such as cancer, cardiovascular disease, neurodegenerative diseases or osteoporosis. In other embodiments the signal transduction pathway is present in plant cells and is involved, for example, in herbicide response, pathogen response, growth, yield, biochemical properties or composition (e.g., oil composition); and/or production of chemicals or biochemicals. In still further embodiments the signal transduction pathway is present in microorganisms, and is involved, for example, in production of a chemical, biochemical, protein or pharmaceutical or in pathogenicity (e.g., replication or toxicity).
In still further aspects, methods for pharmacogenomically selecting an appropriate drug therapy to administer to a first individual having a disease or condition are provided. The methods comprise (a) determining the location and nature of the regulatory accessible regions associated with drug response in the first individual; and (b) correlating the location and nature of the regulatory accessible regions associated with drug response in the first individual to known locations and natures of regulatory accessible regions associated with drug response, thereby selecting an appropriate drug therapy to be administered to the first individual. In certain embodiments, the regulatory accessible regions examined are associated with genes involved in drug metabolism, for example cytochrome P450, N-acetyltransferase, NAD(p)H quinone oxidoreductase, thiopurine methyltransferase, beta2-adrenergic receptor, dopamine D3 receptor, MDR-1 (Multiple drug resistance-1 gene), and MRPs (Multiple drug resistance proteins). Disease states or conditions that will benefit from these methods include, but are not limited to, chronic illness in which individuals will need medication for an extended period (e.g., cardiovascular disease, cystic fibrosis, etc.); conditions such as osteoporosis, neurodegenerative diseases, and cancer in which inappropriate therapy can cause irreversible changes; conditions for which known drug treatments are sometimes associated with allergy or sensitivity.
These and other embodiments will readily occur to those of ordinary skill in the art in view of the disclosure herein.