A. Using Libraries to Identify Genes Associated with a Selected Phenotype
Identification of gene function is a critical step in the selection of new molecular targets for drug discovery, gene therapy, clinical diagnostics, agrochemical discovery, engineering of transgenic plants, e.g., with novel resistance traits or enhanced nutritional characteristics, and genetic engineering of prokaryotes and higher organisms for the production of industrial chemicals, biochemicals, and chemical intermediates. Historically, library screening methods have been used to screen large numbers of uncharacterized genes to identify a gene or genes associated with a particular phenotype, e.g., hybridization screening of nucleic acid libraries, antibody screening of expression libraries, and phenotypic screening of libraries.
For example, molecular markers that co-segregate with a disease trait in a segment of patients can be used as nucleic acid probes to identify, in a library, the gene associated with the disease. In another method, differential gene expression in cells and nucleic acid subtraction can be used to identify and clone genes associated with a phenotype in the test cells, where the control cells do not display the phenotype. However, these methods are laborious because the screening step relies heavily on conventional nucleic acid cloning and sequencing techniques. Development of high throughput screening assays using these methods would therefore be cumbersome.
An example of phenotypic screening of libraries is discovery of transforming oncogenes (see, e.g., Goldfarb et al., Nature 296:404 (1982)). Oncogenic transformation can be observed in NIH 3T3 cells by assaying for loss of contact inhibition and foci formation. cDNA expression libraries from transformed cells are introduced into untransformed cells, and the cells were examined for foci formation. The gene associated with transformation is isolated by clonal propagation and rescue of the expression vector. Unfortunately, this method is limited by phenotype and can only be used to assay for transdominant genes.
Advances in the field of high throughput screening have increased the cell types and phenotypes that can be investigated using library screening methods. Viral vectors such as retroviral, adenoviral, and adenoviral associated vectors have been developed for efficient nucleic acid delivery to cells (see, e.g., U.S. Pat. No. 5,173,414; Tratschin et al., Mol. Cell. Biol. 5:3251-3260 (1985); Tratschin, et al., Mol. Cell. Biol. 4:2072-2081 (1984); Hermonat & Muzyczka, Proc. Nat'l Acad. Sci. USA 81:6466-6470 (1984); and Samulski et al., J. Virol. 63:03822-3828 (1989); Buchscher et al., J. Virol. 66:2731-2739 (1992); Johann et al., J. Virol. 66:1635-1640 (1992); Sommerfelt et al., Virol. 176:58-59 (1990); Wilson et al., J. Virol. 63:2374-2378 (1989); Miller et al., J. Virol. 65:2220-2224 (1991); and PCT/US94/05700). Cells can be phenotypically analyzed either one at a time, using flow cytometry, or in arrayed clonal populations, using liquid handling robots. These techniques allow a sufficient number of library members to be tested for a wide range of potential phenotypes.
Currently, libraries of random molecules are being used with phenotypic screening for the discovery of genes associated with a particular phenotype. For example, random peptide or protein expression libraries are being used to block specific protein-protein interactions and produce a particular phenotype (see, e.g., Caponigro et al., Proc. Nat'l Acad. Sci USA 95:7508-7513 (1998); WO 97/27213; and WO 97 27212). In another method, random antisense nucleic acids or ribozymes are used to inactivate a gene and produce a desired phenotype (see, e.g., WO 99/41371 and Hannon et al., Science 283:1125-1126 (1999)).
The main shortcoming of these methods is the inherent inefficiency of the random molecules, which vastly increases the size of the library to be screened. Even with a known target nucleic acid or protein, literally hundreds of antisense, ribozyme, or peptide molecules must be empirically tested before identifying one that will inhibit gene expression or protein-protein interactions. Since the random library must be enormous to produce sufficient numbers of active molecules, huge numbers of cells must be screened for phenotypic changes. For unknown gene and protein targets, the rarity of effective, bioactive peptides, antisense molecules, or ribozyme molecules imposes significant constraints on high throughput screening assays. Furthermore, these methods can be used only for inhibition of gene expression, but not for activation of gene expression. This feature limits identification of gene function to phenotypes present only in the absence of gene expression.
Therefore, efficient high throughput library screening methods allowing random inhibition or activation of uncharacterized genes would be of great utility to the scientific community. These methods would find widespread use in academic laboratories, pharmaceutical companies, genomics companies, agricultural companies, chemical companies, and in the biotechnology industry.
B. Zinc Finger Proteins as Transcriptional Regulators
Zinc finger proteins (“ZFPs”) are proteins that bind to DNA in a sequence-specific manner and are typically involved in transcription regulation. Zinc finger proteins are widespread in eukaryotic cells. An exemplary motif characterizing one class of these proteins (the Cys2His2 class) is -Cys-(X)2-4-Cys-(X)12-His-(X)3-5-His (SEQ ID NO:1) (where X is any amino acid). A single finger domain is about 30 amino acids in length and several structural studies have demonstrated that it contains an alpha helix containing the two invariant histidine residues co-ordinated through zinc with the two cysteines of a single beta turn. To date, over 10,000 zinc finger sequences have been identified in several thousand known or putative transcription factors. Zinc finger proteins are involved not only in DNA-recognition, but also in RNA binding and protein-protein binding. Current estimates are that this class of molecules will constitute the products of about 2% of all human genes.
The X-ray crystal structure of Zif268, a three-finger domain from a murine transcription factor, has been solved in complex with its cognate DNA-sequence and shows that each finger can be superimposed on the next by a periodic rotation and translation of the finger along the main DNA axis. The structure suggests that each finger interacts independently with DNA over 3 base-pair intervals, with side-chains at positions −1, 2, 3 and 6 on each recognition helix making contacts with respective DNA triplet sub-site.
The structure of the Zif268-DNA complex also suggested that the DNA sequence specificity of a zinc finger protein could be altered by making amino acid substitutions at the four helix positions (−1, 2, 3 and 6) on a zinc finger recognition helix, using, e.g., phage display experiments (see, e.g., Rebar et al., Science 263:671-673 (1994); Jamieson et al., Biochemistry 33:5689-5695 (1994); Choo et al., Proc. Natl. Acad. Sci. U.S.A. 91:11163-11167 (1994); Greisman & Pabo, Science 275:657-661 (1997)). For example, combinatorial libraries were constructed with zinc finger proteins randomized in either the first or middle finger. The randomized zinc finger proteins were then isolated with altered target sites in which the appropriate DNA sub-site was replaced by an altered DNA triplet. Correlation between the nature of introduced mutations and the resulting alteration in binding specificity gave rise to a set of substitution rules for rational design of zinc finger proteins with altered binding specificity. These experiments thus demonstrated that randomized zinc finger proteins could be made, which demonstrated altered target sequence specificity.
Recombinant zinc finger proteins, often combined with a heterologous transcriptional activator or repressor domain, have also shown efficient transcriptional regulation of transiently expressed reporter genes in cultured cells (see, e.g., Pomerantz et al., Science 267:93-96 (1995); Liu et al., Proc. Natl. Acad. Sci. U.S.A. 94:5525-5530 1997); and Beerli et al., Proc. Natl. Acad. Sci. U.S.A. 95:14628-14633 (1998)). For example, Pomerantz et al., Science 267:93-96 (1995) designed a novel DNA binding protein by fusing two fingers from Zif268 with a homeodomain from Oct-1. The hybrid protein was then fused with either a transcriptional activator or repressor domain for expression as a chimeric protein. The chimeric protein was reported to bind a target site representing a hybrid of the subsites of its two components. The chimeric DNA binding protein also activated or repressed expression of a reporter luciferase gene having a target site.
Liu et al., Proc. Natl. Acad. Sci. U.S.A. 94:5525-5530 (1997) constructed a composite zinc finger protein by using a peptide spacer to link two component zinc finger proteins, each having three fingers. The composite protein was then further linked to transcriptional activation or repression domains. The resulting chimeric protein bound to a target site formed from the target segments bound by the two component zinc finger proteins. The chimeric zinc finger protein activated or repressed transcription of a reporter gene having the target site.
Beerli et al., Proc. Natl. Acad. Sci. U.S.A. 95:14628-14633 (1998) constructed a chimeric six finger zinc finger protein fused to either a KRAB, ERD, or SID transcriptional repressor domain, or the VP16 or VP64 transcriptional activation domain. This chimeric zinc finger protein was designed to recognize an 18 bp target site in the 5′ untranslated region of the human erbB-2 gene. This construct both activated and repressed a transiently expressed reporter luciferase construct linked to the erbB-2 promoter.
In addition, a recombinant zinc finger protein was reported to repress expression of an integrated plasmid construct encoding a bcr-ab1 oncogene (Choo et al., Nature 372:642-645 (1994)). Phage display was used to select a variant zinc finger protein that bound to the selected target segment. The variant zinc finger protein thus isolated was then reported to repress expression of a stably transfected bcr-ab1 construct in a cell line. To date, these zinc finger protein methods have focused on regulation of either single, transiently expressed, known genes, or on regulation of single, known exogenous genes that have been integrated into the genome.