The present invention provides methods of regulating gene expression using recombinant zinc finger proteins, for functional genomics and target validation applications.
Determining the function of a gene of interest is important for identifying potential genomic targets for drug discovery. Genes associated with a particular function or phenotype can then be validated as targets for discovery of therapeutic compounds. Historically, the function of a particular gene has been identified by associating expression of the gene with a specification function of phenotype in a biological system such as a cell or a transgenic animal.
One known method used to validate the function of a gene is to genetically remove the gene from a cell or animal (i.e., create a xe2x80x9cknockoutxe2x80x9d) and determine whether or not a phenotype (i.e., any change, e.g., morphological, functional, etc., observable by an assay) of the cell or animal has changed. This determination depends on whether the cell or organism survives without the gene and is not feasible if the gene is required for survival. Other genes are subject to counteracting mechanisms that are able to adapt to the disappearance of the gene and compensate for its function in other ways. This compensation may be so effective, in fact, that the true function of the deleted gene may go unnoticed. The technical process of creating a xe2x80x9cknockoutxe2x80x9d is laborious and requires extensive sequence information, thus commanding immense monetary and technical resources if undertaken on a genome wide scale.
In another example, antisense methods of gene regulation and methods that rely on targeted ribozymes are highly unpredictable. Another method for experimentally determining the function of a newly discovered gene is to clone its cDNA into an expression vector driven by a strong promoter and measure the physiological consequence of its over-expression in a transfected cell. This method is also labor intensive and does not address the physiological consequences of down-regulation of a target gene. Therefore, simple methods allowing the selective over- and under-expression of uncharacterized genes would be of great utility to the scientific community. Methods that permit the regulation of genes in cell model systems, transgenic animals and transgenic plants would find widespread use in academic laboratories, pharmaceutical companies, genomics companies and in the biotechnology industry.
An additional use of target validation is in the production of in vivo and in vitro assays for drug discovery. Once the gene causing a selected phenotype has been identified, cell lines, transgenic animals and transgenic plants could be engineered to express a useful protein product or repress a harmful one. These model systems are then used, e.g., with high throughput screening methodology, to identify lead therapeutic compounds that regulate expression of the gene of choice, thereby providing a desired phenotype, e.g., treatment of disease.
Methods currently exist in the art, which allow one to alter the expression of a given gene, e.g., using ribozymes, antisense technology, small molecule regulators, over-expression of cDNA clones, and gene-knockouts. As described above, these methods have to date proven to be generally insufficient for many applications and typically have not demonstrated either high target efficacy or high specificity in vivo. For useful experimental results and therapeutic treatments, these characteristics are desired.
Gene expression is normally controlled by sequence specific DNA binding proteins called transcription factors. These bind in the general proximity (although occasionally at great distances) of the point of transcription initiation of a gene and typically include both a DNA binding domain and a regulatory domain. They act to influence the efficiency of formation or function of a transcription initiation complex at the promoter. Transcription factors can act in a positive fashion (transactivation) or in a negative fashion (transrepression). Although transcription factors typically contain a regulatory domain, repression can also be achieved by steric hindrance via a DNA binding domain alone.
Transcription factor function can be constitutive (always xe2x80x9conxe2x80x9d) or conditional. Conditional function can be imparted on a transcription factor by a variety of means, but the majority of these regulatory mechanisms depend of the sequestering of the factor in the cytoplasm and the inducible release and subsequent nuclear translocation, DNA binding and transactivation (or repression). Examples of transcription factors that function this way include progesterone receptors, sterol response element binding proteins (SREBPs) and NF-kappa B. There are examples of transcription factors that respond to phosphorylation or small molecule ligands by altering their ability to bind their cognate DNA recognition sequence (Hou et al., Science 256:1701 (1994); Gossen and Bujard, Proc. Natl. Acad. Sci. U.S.A. 89:5547 (1992); Oligino et al., Gene Ther. 5:491-496 (1998); Wang et al., Gene Ther. 4:432-441(1997); Neering et al., Blood 88:1147-1155 (1996); and Rendahl et al., Nat. Biotechnol. 16:757-761 (1998)).
Zinc finger proteins (xe2x80x9cZFPsxe2x80x9d) are proteins that can bind to DNA in a sequence-specific manner. Zinc fingers were first identified in the transcription factor TFIIIA from the oocytes of the African clawed toad, Xenopus laevis. Zinc finger proteins are widespread in eukaryotic cells. An exemplary motif characterizing one class of these proteins (Cys2His2 class) is -Cys-(X)2xe2x88x924-Cys-(X)12-His-(X)3xe2x88x925-His (SEQ ID NO: 1) (where X is any amino acid). A single finger domain is about 30 amino acids in length and several structural studies have demonstrated that it contains an alpha helix containing the two invariant histidine residues co-ordinated through zinc with the two cysteines of a single beta turn. To date, over 10,000 zinc finger sequences have been identified in several thousand known or putative transcription factors. Zinc finger proteins are involved not only in DNA-recognition, but also in RNA binding and protein-protein binding. Current estimates are that this class of molecules will constitute the products of about 2% of all human genes.
The X-ray crystal structure of Zif268, a three-finger domain from a murine transcription factor, has been solved in complex with its cognate DNA-sequence and shows that each finger can be superimposed on the next by a periodic rotation and translation of the finger along the main DNA axis. The structure suggests that each finger interacts independently with DNA over 3 base-pair intervals, with side-chains at positions xe2x88x921, 2, 3 and 6 on each recognition helix making contacts with respective DNA triplet sub-site. The amino terminus of Zif268 is situated at the 3xe2x80x2 end of its DNA recognition subsite. Recent results have indicated that some zinc fingers can bind to a fourth base in a target segment (Isalan et al., Proc. Natl. Acad. Sci. U.S.A. 94:5617-5621 (1997). The fourth base is on the opposite strand from the other three bases recognized by zinc finger and complementary to the base immediately 3xe2x80x2 of the three base subsite.
The structure of the Zif268-DNA complex also suggested that the DNA sequence specificity of a zinc finger protein might be altered by making amino acid substitutions at the four helix positions (xe2x88x921, 2, 3 and 6) on a zinc finger recognition helix. Phage display experiments using zinc finger combinatorial libraries to test this observation were published in a series of papers in 1994 (Rebar et al., Science 263:671-673 (1994); Jamieson et al., Biochemistry 33:5689-5695 (1994); Choo et al., Proc. Natl. Acad. Sci. U.S.A. 91:11163-11167 (1994)). Combinatorial libraries were constructed with randomized side-chains in either the first or middle finger of Zif268 and then isolated with an altered Zif268 binding site in which the appropriate DNA sub-site was replaced by an altered DNA triplet. Correlation between the nature of introduced mutations and the resulting alteration in binding specificity gave rise to a partial set of substitution rules for rational design of zinc finger proteins with altered binding specificity. Greisman and Pabo, Science 275:657-661 (1997) discuss an elaboration of a phage display method in which each finger of a zinc finger protein is successively subjected to randomization and selection. This paper reported selection of zinc finger proteins for a nuclear hormone response element, a p53 target site and a TATA box sequence.
Recombinant zinc finger proteins have been reported to have the ability to regulate gene expression of transiently expressed reporter genes in cultured cells (see, e.g., Pomerantz et al., Science 267:93-96 (1995); Liu et al., Proc. Natl. Acad. Sci. U.S.A. 94:5525-5530 1997); and Beerli et al., Proc. Natl. Acad. Sci. U.S.A. 95:14628-14633 (1998)). For example, Pomerantz et al., Science 267:93-96 (1995) report an attempt to design a novel DNA binding protein by fusing two fingers from Zif268 with a homeodomain from Oct-1. The hybrid protein was then fused with either a transcriptional activator or repressor domain for expression as a chimeric protein. The chimeric protein was reported to bind a target site representing a hybrid of the subsites of its two components. The authors then constructed a reporter vector containing a luciferase gene operably linked to a promoter and a hybrid site for the chimeric DNA binding protein in proximity to the promoter. The authors reported that their chimeric DNA binding protein could activate or repress expression of the luciferase gene.
Liu et al., Proc. Natl. Acad. Sci. U.S.A. 94:5525-5530 (1997) report forming a composite zinc finger protein by using a peptide spacer to link two component zinc finger proteins, each having three fingers. The composite protein was then further linked to transcriptional activation or repression domains. It was reported that the resulting chimeric protein bound to a target site formed from the target segments bound by the two component zinc finger proteins. It was further reported that the chimeric zinc finger protein could activate or repress transcription of a reporter gene when its target site was inserted into a reporter plasmid in proximity of a promoter operably linked to the reporter.
Beerli et al., Proc. Natl. Acad. Sci. U.S.A. 95:14628-14633 (1998) report construction of a chimeric six finger zinc finger protein fused to either a KRAB, ERD, or SID transcriptional repressor domain, or the VP16 or VP64 transcriptional activation domain. This chimeric zinc finger protein was designed to recognize an 18 bp target site in the 5xe2x80x2 untranslated region of the human erbB-2 gene. Using this construct, the authors of this study report both activation and repression of a transiently expressed reporter luciferase construct linked to the erbB-2 promoter.
In addition, a recombinant zinc finger protein was reported to repress expression of an integrated plasmid construct encoding a bcr-abl oncogene (Choo et al., Nature 372:642-645 (1994)). The target segment to which the zinc finger proteins bound was a nine base sequence GCA GAA GCC chosen to overlap the junction created by a specific oncogenic translocation fusing the genes encoding bcr and abl. The intention was that a zinc finger protein specific to this target site would bind to the oncogene without binding to abl or bcr component genes. The authors used phage display to select a variant zinc finger protein that bound to this target segment. The variant zinc finger protein thus isolated was then reported to repress expression of a stably transfected bcr-abl construct in a cell line.
To date, these methods have focused on regulation of either transiently expressed, known genes, or on regulation of known exogenous genes that have been integrated into the genome. In contrast, specific regulation of a candidate gene or list of genes to identify the cause of a selected phenotype has not been demonstrated in the art. Therefore, a need exists for useful methods of identifying the biological function of a selected gene or genes and or validating a gene or genes as a suitable target for drug discovery.
The present invention thus provides for the first time methods of identifying a gene or genes associated a selected phenotype, e.g., for drug discovery, target validation, or functional genomics.
In one aspect, the present invention provides a method of identifying the biological function of a candidate gene, the method comprising the steps of: (i) selecting a first candidate gene; (ii) providing a first zinc finger protein that binds to a first target site of the first candidate gene and a second zinc finger protein that binds to a target site of a second gene; (iii) culturing a first cell under conditions where the first zinc finger protein contacts the first candidate gene and culturing a second cell under conditions where the second zinc finger protein contacts the second candidate gene, wherein the first and the second zinc finger proteins modulate expression of the first and second candidate genes; and (iv) assaying for a selected phenotype, thereby identifying whether or not the first candidate gene is associated with the selected phenotype.
In another aspect, the present invention provides a method of identifying the biological function of a candidate gene, the method comprising the steps of: (i) identifying a plurality of candidate genes; (ii) providing a first zinc finger protein that binds to a first target site of a first candidate gene; (iii) culturing a first cell under conditions where the first zinc finger protein contacts the first candidate gene, wherein the first zinc finger protein modulates expression of the first candidate gene; (iv) determining the expression pattern of the candidate genes and determining whether or not the first candidate gene is associated with the selected phenotype; and(v) repeating steps (ii)-(iv) for each candidate gene.
In another aspect, the present invention provides a method of identifying the biological function of a candidate gene, the method comprising the steps of: (i) selecting a first candidate gene; (ii) providing a first zinc finger that binds to a first target site of the first candidate gene and a second zinc finger that binds to a second target site of the first candidate gene; (iii) culturing a first cell under conditions where the first zinc finger protein contacts the first candidate gene, and culturing a second cell under conditions where the second zinc finger protein contacts the first candidate gene, wherein the first and the second zinc finger proteins modulate expression of the first candidate gene; and (iv) assaying for a selected phenotype, thereby identifying whether or not the first candidate gene is associated with the selected phenotype.
In another aspect, the present invention provides a method of identifying the biological function of a candidate gene, the method comprising the steps of: (i) selecting a first candidate gene; (ii) providing a first zinc finger protein that binds to a first target site of the first candidate gene; (iii) culturing a first cell under conditions where the first candidate zinc finger protein contacts the first candidate gene, wherein the first zinc finger proteins modulate expression of the first candidate gene; and (iv) assaying for a selected phenotype, thereby identifying whether or not the first candidate gene is associated with the selected phenotype.
In one embodiment, the method further comprises providing a third zinc finger protein that binds to a second target site of the first candidate gene. In one embodiment, the method further comprises provide a third zinc finger protein that binds to a target site of a second candidate gene. In another embodiment, the method further comprises selecting a plurality of candidate genes and providing a plurality of zinc finger proteins that bind to a target site of each candidate gene.
In one embodiment, the first candidate gene is partially encoded by an EST of at least about 200 nucleotides in length. In one embodiment, the first candidate gene and the second gene are both associated with the selected phenotype. In one embodiment, the second gene is a control gene. In one embodiment, the first and second cell are the same cell, wherein the cell comprises the first and second candidate genes. In one embodiment, the first and the second candidate genes are endogenous genes.
In one embodiment, expression of the candidate genes is inhibited by at least about 50%. In one embodiment, expression of the candidate genes is activated by at least about 150%. In one embodiment, the modulation of expression is activation of gene expression that prevents repression of gene expression. In one embodiment, the modulation of expression is inhibition of gene expression that prevents gene activation.
In one embodiment, the zinc finger proteins are fusion proteins comprising one or more regulatory domains. In one embodiment, the regulatory domain is selected from the group consisting of a transcriptional repressor, a methyl transferase, a transcriptional activator, a histone acetyltransferase, and a histone deacetylase.
In one embodiment, the cell is selected from the group consisting of animal cell, a plant cell, a bacterial cell, a protozoal cell, a fungal cell, a mammalian cell, or a human cell. In one embodiment, the cell comprises less than about 1.5xc3x97106 copies of each zinc finger protein.
In one embodiment, the first and second zinc finger proteins are encoded by an expression vector comprising a zinc finger protein nucleic acid operably linked to a promoter, and wherein the method further comprises the step of first administering the expression vector to the cell. In one embodiment, expression of the zinc finger proteins is induced by administration of an exogenous agent. In one embodiment, expression of the zinc finger proteins is under small molecule control. In one embodiment, expression of the first zinc finger protein and expression of the second zinc finger protein are under different small molecule control, wherein both the first and the second zinc finger protein are fusion proteins comprising a regulatory domain, and wherein the first and the second zinc finger proteins are expressed in the same cell. In one embodiment, both the first and second zinc finger proteins comprise regulatory domains that are repressors. In one embodiment, the first zinc finger protein comprises a regulatory domain that is an activator, and the second zinc finger protein comprises a regulatory domain that is a repressor.
In one embodiment, the expression vector is a viral vector. In another embodiment, the expression vector is a retroviral expression vector, an adenoviral expression vector, or an AAV expression vector. In one embodiment, the zinc finger proteins are encoded by a nucleic acid operably linked to an inducible promoter.
In one embodiment, the target site is upstream of a transcription initiation site of the candidate gene. In one embodiment, the target site is downstream of a transcription initiation site of the candidate gene. In one embodiment, the target site is adjacent to a transcription initiation site of the candidate gene. In another embodiment, the target site is adjacent to an RNA polymerase pause site downstream of a transcription initiation site of the candidate gene.