The present disclosure is in the field of functional genomics and gene identification.
Determining the function of a gene of interest is important for identifying potential genomic targets for drug discovery. Genes associated with a particular function or phenotype can then be validated as targets for discovery of therapeutic compounds. Historically, the function of a particular gene has been identified by associating expression of the gene with a specification function of phenotype in a biological system such as a cell or a transgenic animal.
One known method used to validate the function of a gene is to genetically remove the gene from a cell or animal (i.e., create a xe2x80x9cknockoutxe2x80x9d) and determine whether or not a phenotype (i.e., any change, e.g., morphological, functional, etc., observable by an assay) of the cell or animal has changed. This determination depends on whether the cell or organism survives without the gene and is not feasible if the gene is required for survival. Other genes are subject to counteracting mechanisms that are able to adapt to the disappearance of the gene and compensate for its function in other ways. This compensation may be so effective, in fact, that the true function of the deleted gene may go unnoticed. The technical process of creating a xe2x80x9cknockoutxe2x80x9d is laborious and requires extensive sequence information, thus commanding immense monetary and technical resources if undertaken on a genome wide scale.
In another example, antisense methods of gene regulation and methods that rely on targeted ribozymes are highly unpredictable. Another method for experimentally determining the function of a newly discovered gene is to clone its cDNA into an expression vector driven by a strong promoter and measure the physiological consequence of its over-expression in a transfected cell. This method is also labor intensive and does not address the physiological consequences of down-regulation of a target gene. Therefore, simple methods allowing the selective over- and under-expression of uncharacterized genes would be of great utility to the scientific community. Methods that permit the regulation of genes in cell model systems, transgenic animals and transgenic plants would find widespread use in academic laboratories, pharmaceutical companies, genomics companies and in the biotechnology industry.
An additional use of target validation is in the production of in vivo and in vitro assays for drug discovery. Once the gene causing a selected phenotype has been identified, cell lines, transgenic animals and transgenic plants could be engineered to express a useful protein product or repress a harmful one. These model systems are then used, e.g., with high throughput screening methodology, to identify lead therapeutic compounds that regulate expression of the gene of choice, thereby providing a desired phenotype, e.g., treatment of disease.
Methods currently exist in the art, which allow one to alter the expression of a given gene, e.g., using ribozymes, antisense technology, small molecule regulators, over-expression of cDNA clones, and gene-knockouts. As described above, these methods have to date proven to be generally insufficient for many applications and typically have not demonstrated either high target efficacy or high specificity in vivo. For useful experimental results and therapeutic treatments, these characteristics are desired.
Gene expression is normally controlled by sequence specific DNA binding proteins called transcription factors. These bind in the general proximity (although occasionally at great distances) of the point of transcription initiation of a gene and typically include both a DNA binding domain and a regulatory domain. They act to influence the efficiency of formation or function of a transcription initiation complex at the promoter. Transcription factors can act in a positive fashion (transactivation) or in a negative fashion (transrepression). Although transcription factors typically contain a regulatory domain, repression can also be achieved by steric hindrance via a DNA binding domain alone.
Transcription factor function can be constitutive (always xe2x80x9conxe2x80x9d) or conditional. Conditional function can be imparted on a transcription factor by a variety of means, but the majority of these regulatory mechanisms depend of the sequestering of the factor in the cytoplasm and the inducible release and subsequent nuclear translocation, DNA binding and transactivation (or repression). Examples of transcription factors that function this way include progesterone receptors, sterol response element binding proteins (SREBPs) and NF-kappa B. There are examples of transcription factors that respond to phosphorylation or small molecule ligands by altering their ability to bind their cognate DNA recognition sequence (Hou et al., Science 256:1701 (1994); Gossen and Bujard, Proc. Natl. Acad. Sci. U.S.A. 89:5547 (1992); Oligino et al., Gene Ther. 5:491-496 (1998); Wang et al., Gene Ther. 4:432-441 (1997); Neering et al., Blood 88:1147-1155 (1996); and Rendahl et al, Nat. Biotechnol. 16:757-761 (1998)).
Zinc finger proteins (xe2x80x9cZFPsxe2x80x9d) are proteins that can bind to DNA in a sequence-specific manner. Zinc fingers were first identified in the transcription factor TFIIIA from the oocytes of the African clawed toad, Xenopus laevis. Zinc finger proteins are widespread in eukaryotic cells. An exemplary motif characterizing one class of these proteins (Cys2His2 class) is -Cys-(X)2-4-Cys-(X)12-His-(X)3-5-His (SEQ ID NO: 1) (where X is any amino acid). A single finger domain is about 30 amino acids in length and several structural studies have demonstrated that it contains an alpha helix containing the two invariant histidine residues co-ordinated through zinc with the two cysteines of a single beta turn. To date, over 10,000 zinc finger sequences have been identified in several thousand known or putative transcription factors. Zinc finger proteins are involved not only in DNA-recognition, but also in RNA binding and protein-protein binding. Current estimates are that this class of molecules will constitute the products of about 2% of all human genes.
The X-ray crystal structure of Zif268, a three-finger domain from a murine transcription factor, has been solved in complex with its cognate DNA-sequence and shows that each finger can be superimposed on the next by a periodic rotation and translation of the finger along the main DNA axis. The structure suggests that each finger interacts independently with DNA over 3 base-pair intervals, with side-chains at positions xe2x88x921, 2, 3 and 6 on each recognition helix making contacts with respective DNA triplet sub-site. The amino terminus of Zif268 is situated at the 3xe2x80x2 end of its DNA recognition subsite. Recent results have indicated that some zinc fingers can bind to a fourth base in a target segment (Isalan et al., Proc. Natl. Acad. Sci. U.S.A. 94:5617-5621 (1997). The fourth base is on the opposite strand from the other three bases recognized by zinc finger and complementary to the base immediately 3xe2x80x2 of the three base subsite.
The structure of the Zif268-DNA complex also suggested that the DNA sequence specificity of a zinc finger protein might be altered by making amino acid substitutions at the four helix positions (xe2x88x921, 2, 3 and 6) on a zinc finger recognition helix. Phage display experiments using zinc finger combinatorial libraries to test this observation were published in a series of papers in 1994 (Rebar et al., Science 263:671-673 (1994); Jamieson et al., Biochemistry 33:5689-5695 (1994); Choo et al., Proc. Natl. Acad. Sci. U.S.A. 91:11163-11167 (1994)). Combinatorial libraries were constructed with randomized side-chains in either the first or middle finger of Zif268 and then isolated with an altered Zif268 binding site in which the appropriate DNA sub-site was replaced by an altered DNA triplet. Correlation between the nature of introduced mutations and the resulting alteration in binding specificity gave rise to a partial set of substitution rules for rational design of zinc finger proteins with altered binding specificity. Greisman and Pabo, Science 275:657-661 (1997) discuss an elaboration of a phage display method in which each finger of a zinc finger protein is successively subjected to randomization and selection. This paper reported selection of zinc finger proteins for a nuclear hormone response element, a p53 target site and a TATA box sequence.
Recombinant zinc finger proteins have been reported to have the ability to regulate gene expression of transiently expressed reporter genes in cultured cells (see, e.g., Pomerantz et al., Science 267:93-96 (1995); Liu et al., Proc. Natl. Acad. Sci. U.S.A. 94:5525-5530 1997); and Beerli et al., Proc. Natl. Acad. Sci. U.S.A. 95:14628-14633 (1998)). For example, Pomerantz et al., Science 267:93-96 (1995) report an attempt to design a novel DNA binding protein by fusing two fingers from Zif268 with a homeodomain from Oct-1. The hybrid protein was then fused with either a transcriptional activator or repressor domain for expression as a chimeric protein. The chimeric protein was reported to bind a target site representing a hybrid of the subsites of its two components. The authors then constructed a reporter vector containing a luciferase gene operably linked to a promoter and a hybrid site for the chimeric DNA binding protein in proximity to the promoter. The authors reported that their chimeric DNA binding protein could activate or repress expression of the luciferase gene.
Liu et al., Proc. Natl. Acad. Sci. U.S.A. 94:5525-5530 (1997) report forming a composite zinc finger protein by using a peptide spacer to link two component zinc finger proteins, each having three fingers. The composite protein was then further linked to transcriptional activation or repression domains. It was reported that the resulting chimeric protein bound to a target site formed from the target segments bound by the two component zinc finger proteins. It was further reported that the chimeric zinc finger protein could activate or repress transcription of a reporter gene when its target site was inserted into a reporter plasmid in proximity of a promoter operably linked to the reporter.
Beerli et al., Proc. Natl. Acad. Sci. U.S.A. 95:14628-14633 (1998) report construction of a chimeric six finger zinc finger protein fused to either a KRAB, ERD, or SID transcriptional repressor domain, or the VP16 or VP64 transcriptional activation domain. This chimeric zinc finger protein was designed to recognize an 18 bp target site in the 5xe2x80x2 untranslated region of the human erbB-2 gene. Using this construct, the authors of this study report both activation and repression of a transiently expressed reporter luciferase construct linked to the erbB-2 promoter.
In addition, a recombinant zinc finger protein was reported to repress expression of an integrated plasmid construct encoding a bcr-abl oncogene (Choo et al., Nature 372:642-645 (1994)). The target segment to which the zinc finger proteins bound was a nine base sequence GCA GAA GCC chosen to overlap the junction created by a specific oncogenic translocation fusing the genes encoding bcr and abl. The intention was that a zinc finger protein specific to this target site would bind to the oncogene without binding to abl or bcr component genes. The authors used phage display to select a variant zinc finger protein that bound to this target segment. The variant zinc finger protein thus isolated was then reported to repress expression of a stably transfected ber-abl construct in a cell line.
To date, these methods have focused on regulation of either transiently expressed, known genes, or on regulation of known exogenous genes that have been integrated into the genome. In contrast, specific regulation of a candidate gene or list of genes to identify the cause of a selected phenotype has not been demonstrated in the art. Therefore, a need exists for useful methods of identifying the biological function of a selected gene or genes and or validating a gene or genes as a suitable target for drug discovery.
Furthermore, the determination of a draft nucleotide sequence of the human genome opens up the prospect of identifying all human genes. See, for example, Science 291:1177-1351 (2001) and Nature 409:813-958 (2001). Identification of, for example, disease-related genes could lead to the discovery of new therapeutics. Some genes have already been identified based on protein and/or RNA expression; while others have been and can be identified by homology to other human genes or to related genes in other organisms.
However, many problems in unambiguously identifying human genes still exist and as a result, a complete list of human genes is not currently available, nor is it likely to become available in the near future. For example, the use of expressed sequence tag (EST) sequences to predict the existence of a gene is subject to artifacts arising from unspliced RNA, non-gene-derived transcription and contamination of cDNA preparations, from which ESTs are derived, with genomic DNA. The use of sequence similarity to known genes as a criterion for identifying new genes rules out the possibility of identifying any new gene for which a homologous sequences is not already known. Various gene prediction algorithms have been devised, but their success rate in identifying new genes is unacceptably low. Thus, currently-available methods for predicting the existence of a gene, based on analysis of genome sequence, are not particularly effective. See, in particular, Nature 409 supra p. 819 (xe2x80x9cWhen is a predicted gene a gene?xe2x80x9d) and pp. 892-907 (xe2x80x9cGene content of the human genomexe2x80x9d); Galas (2001) Science 291:1257-1260; and Goodman (2001) Genome Technology July 2001:52-55.
Accordingly, there is a need for methods to confirm putative gene assignments that are based on gene prediction algorithms, sequence homology, ESTs and related techniques.
In one aspect, described herein is a method for identifying a gene. In certain embodiments, the method comprises: (a) obtaining a putative gene sequence (PGS); (b) contacting a cell with an exogenous molecule, wherein the cell comprises the putative gene sequence, and wherein the exogenous molecule binds to and modulates expression of the putative gene sequence; and (c) assaying the cell for at least one selected phenotype, wherein, if one or more of the selected phenotypes are observed, the putative gene sequence is identified as a gene. The putative gene sequence can be obtained, for example, from a gene prediction algorithm; by analysis of expressed sequence tags; and/or by homology. In any of the methods described herein, the gene can encode, for example, a protein or an RNA (e.g., structural RNA, regulatory RNA, enzymatic RNA, antisense RNA, ribozyme, ribosomal RNA or transfer RNA) and the cell can be, for example, an animal cell (e.g., a mammalian cell such as a human cell), a plant cell, a bacterial cell, a protozoal cell, or a fungal cell. The exogenous molecule can be, for example, a zinc finger protein.
In certain embodiments, the exogenous molecule binds near the putative transcription startsite of the PGS. In other embodiments, the exogenous molecule binds in the putative transcribed region of the PGS (e.g., in the putative coding region of the PGS). In still further embodiments, the exogenous molecule binds in a putative nontranscribed regulatory region of the PGS.
In further embodiments, the exogenous molecule comprises an activation domain (e.g., VP 16, p65 and functional fragments thereof); a repression domain (e.g., KRAB, verbA and functional fragments thereof); or a bifunctional domain (BFD), such as thyroid hormone receptor, retinoic acid receptor, estrogen receptor, glucocorticoid receptor and functional fragments thereof, in which the activity of the bifunctional domain is dependent upon interaction of the BFD with a second molecule (e.g, a protein or a small molecule such as 3,5,3xe2x80x2-triiodo-L-thyronine (T3), all-trans-retinoic acid, estradiol, tamoxifen, 4-hydroxy-tamoxifen, RU-486 or dexamethasone).
In further embodiments, the phenotype is a change in a property, for example, cell growth, cell cycle control, cellular physiology and cellular response to a pathogen. In other embodiments, the phenotype is expression of a RNA molecule. In yet other embodiments, the phenotype is an alteration in the transcriptional program of the cell.
In still further embodiments, the cell is infected with a virus and the gene is a viral gene.
These and other embodiments will be readily apparent to one of skill in the art upon reading the present disclosure.