1. Field of the Invention
The invention is directed generally to genetic engineering methods and systems, and specifically to efficient methods and systems of gene targeting and integration of genes into a genome.
2. Description of the Related Art
Homologous recombination (HR) is at the heart of gene-targeting technologies. Technologies that increase the frequency of homologous recombination increase gene-targeting efficiency [1]. Applications of gene targeting technologies include gene knockouts and gene replacements in animals, cell model systems and gene therapies involving modification(s) of defective genes. Targeted homologous recombination (gene targeting) is widely used in murine embryonic stem (ES) cells as a method to ablate or introduce mutations into endogenous genes [2]. The subsequent transmission of the targeted alleles in ES cells into the mouse germ line provides a powerful method for studying gene function and has resulted in the creation of several critical mouse models for a variety of diseases [3, 4].
Development of efficient gene targeting methods for ES and somatic cells will greatly expand the use of gene replacement as a tool for researchers in their studies. Replacement of gene segment(s) known to cause disease in humans or gene ablation in human ES cells or adult somatic cells will pave the way to understanding biological processes and help in identifying causes and cures of diseases. Gene targeting in human ES and somatic cells has important applications in areas where rodent models do not adequately recapitulate human biology or disease progression. However, those applications require targeting frequencies much above the 0.1% levels achievable with today's technology. Currently, gene therapy mostly uses viral mediated approaches, which although successful, also can lead to serious complications [5]. While viral vectors provide efficient gene delivery, their main limitations are in areas of safety, in part because random integration of the vector may cause inactivation or activation of endogenous genes leading to potentially serious side effects [5, 6]. For example, in a retroviral mediated gene therapy clinical trial of ten children with X-linked SCID (IL-2 receptor γ chain defect), three of the children developed acute lymphoblastic leukemia due to insertion of the vector close to the LMO2 proto-oncogene [7, 8]. Therefore, the best biological approach for gene therapy is HR.
When DNA is introduced into mammalian cells by transfection, the cell machinery integrates transfected DNA into the genome by one of two routes: (1) a HR pathway in which the introduced DNA replaces the endogenous genomic sequences; and (2) a non-homologous pathway leading to random integration. The RAD51 gene of eukaryotes is a homologue of the E. coli recA gene and plays a crucial role in HR [12, 13]. The HR pathway entails pairing of homologous DNAs, strand exchange between them, and resolution of one or more Holliday junctions. A network of interacting proteins catalyzes each step [14]. RAD51 provides the enzymatic functions for recognition of homology and DNA strand exchange in HR. RAD51 binds and polymerizes onto the introduced DNA in a step called presynapsis. In the next step, the RAD51 nucleoprotein filament searches for homologous regions in chromosomal DNA, catalyzes pairing between introduced and endogenous DNA, and promotes strand exchange [13, 15].
The amino acids at the N-terminal region of RAD51 are highly conserved among species and form a domain involved in the oligomerization of RAD51 for nucleoprotein filament formation [16]. Besides RAD51, RAD52 and Replication Protein A (RPA), together known as “the catalytic triad” of proteins, are also involved in filament formation [17]. HR is mediated by double strand breaks in the recombining DNAs and RPA binds and protects the single strand ends until they are coated with RAD51. RAD51 then forms a helical nucleoprotein filament on the single strand DNA—a process facilitated by RAD52 displacement of RPA. Genetic and biochemical studies indicate that RAD51 interacts with both RAD52 and RPA [14, 18]. Another protein involved in RAD51 function is BRCA2. The conserved BRC domains within BRCA2 bind RAD51 and it is thought that this interaction recruits RAD51 to sites of DNA damage, thereby promoting repair and/or recombination [19]. Cells transiently over-expressing either recA or RAD51 (2-fold over-expression) show some 20-fold elevation in the frequencies of HR, suggesting that efficient nucleoprotein filament formation is a rate limiting step in HR [1, 20]. The novel approach taken in this proposal to increase the frequency of HR involves covalently attaching a RAD51 binding peptide to the targeting vector so as to promote nucleation and polymerization of RAD51 onto the transfected DNA. By analogy to the function of BRCA2, the recruitment of RAD51 to the targeting DNA is expected to increase nucleoprotein filament formation and thereby HR (FIG. 1).
Three proteins, RAD52, RPA and BRCA2 have been shown to interact physically with RAD51 [19, 21, 22]. Most of the studies regarding RAD52 function come from studies done in yeast, where both genetic and molecular studies clearly demonstrate its binding to the N-terminal region of RAD51 and its role in HR and repair of double strand DNA breaks [23]. Interaction between the human RAD52 and RAD51 proteins has been demonstrated in the yeast two hybrid system, by their co-immunoprecipitation when expressed in either HeLa or insect cells, and by affinity column chromatography [21, 24, 25]. By deletion analysis, the domain within RAD52 that binds RAD51 has been mapped to amino acid residues 291-330 of the human RAD52 [21].
RPA is a heterotrimer consisting of 70, 32 and 14 kDa subunits and it is the 70 kDa subunit which binds to RAD51 [17, 22]. At sites of DNA damage, RPA binds to single strand DNA and protects the exposed DNA ends until they can be coated by RAD51. RPA also removes secondary structures that prevent extension of nucleoprotein filament formation by RAD51. NMR chemical shift mapping indicates that residues 1-93 at the N terminus of RAD51 interact with the DNA binding region of the 70 kDa subunit of RPA in a domain defined by residues 181-326 [26].
Human BRCA2 is a very large protein (3418 amino acids) but its RAD51 binding domains appear to reside within a set of eight conserved repeats (termed BRC1-8) each about 70 amino acids in length [27, 28]. Mutations within this eight-repeat region are associated with a predisposition to cancer and reduced DNA repair [29, 30]. Recent reports indicate that although the repeats are homologous, they appear to bind RAD51 at different regions of the protein. BRC3 (residues 1415-1483) binds to the N-terminal region of RAD51, whereas BRC4 (residues 1511-1579) binds to the nucleotide binding core of RAD51 located in the middle of that protein (residue 127-135:GEFRTGKT [SEQ ID NO:9] and 228-232: LLIVD [SEQ ID NO:10]) [27, 28]. That finding is important for this proposal because it suggests that a targeting vector containing both the BRC3 and BRC4 domains may recruit RAD51 in vivo much more efficiently due to cooperative binding. Furthermore, BRC5 (residues 1618-1670) does not appear to interact with RAD51, despite its homology to the other BRC repeats. The interaction of BRC repeats with RAD51 was shown using peptides (˜69 amino acids in length) corresponding in sequence to a given BRC repeat. These studies also demonstrated that shorter peptides (˜30 amino acids in length) corresponding to the central conserved motif of the BRC repeats were not effective for RAD51 binding [27]. This observation forms the rationale for coupling the full-length RAD51 binding domain to the targeting DNA as “bait” for the recruitment of RAD51.
Blocking DNA ends so as to prevent exonucleolytic degradation has been shown to reduce unwanted random integration, and in Dictyostelium discoideum, to increase site-specific targeting events [31]. According to the present invention, the ends of the targeting DNA are blocked by ligation of a hairpin oligonucleotide so as to eliminate free 3′ or 5′ ends. Another avenue for increased gene targeting is based on more effective gene delivery to the nucleus where HR takes place. Several attempts to improve the entry of plasmid DNA into the nucleus have been reported including “piggyback” techniques like electrostatic binding of DNA to cationic proteins containing a nuclear localization signal (NLS) [32, 33], NLS-containing peptides [34], lipids [35] and karyophilic proteins [36, 37]. A major drawback of this “piggyback” nuclear transport is that it relies upon the unpredictable stability of the complex in the cytoplasm [35]. In our case, we will covalently couple an NLS-containing peptide to one end of the targeting DNA.
Interestingly, DNAs tagged with a single NLS-peptide show enhanced delivery to the nucleus, but the presence of more than one NLS-peptide tag on a DNA molecule prevents gene delivery. This observation suggests that a DNA molecule with two NLS tags threads through two adjacent nuclear pores in a manner that leads to entrapment of the DNA at the nuclear membrane and consequent decreased gene delivery into the nucleus [35].
Targeting of a gene to the HPRT locus is desirable for gene therapy not only because it avoids the random insertional mutagenesis associated with viral mediated gene delivery, but also because extensive experience exists relevant to targeting genes to that locus. The HPRT gene is expressed in all cells and during all stages of development and is a locus that constitutively remains in an open chromatin configuration. Transgenic mice expressing human angiotensinogen from a gene targeted to the HPRT locus showed normal tissue expression and functionality at physiological levels [38].
The protein encoded by the HPRT gene (nine exons spread over a 33 Kb region of the X chromosome) is involved in the salvage pathway of nucleotide metabolism [9]. Cells with functional HPRT incorporate the nucleotide analogue 6-thioguanine (6-TG) into DNA, which leads to cell death. In the absence of HPRT, there is no incorporation of 6-TG into the nucleotide pool, thus, HPRT disrupted cells survive in the presence of 6-TG [9]. The HPRT locus has been used as a target for the study of various aspects of HR. In one study, the influence of homology length in the targeting vector and its targeting efficiency was compared using the HPRT locus. Deng and Capecchi [39] demonstrated similar targeting frequencies in vectors of different homology lengths. Zhang et al [40] evaluated HR frequencies as a function of the endogenous size of the deletion region that occurs upon insertion of targeting vectors. Hatada et al used targeting to the HPRT locus to show that HR frequencies are similar in ES cells versus hematopoietic progenitor cells [11].
Since the HPRT targeting vector contains a neomycin resistance marker, both random and targeted integration events will confer resistance to the antibiotic G418. However, when cells are grown in the presence of G418 plus 6-TG, only cells with a targeted disruption of the HPRT locus will survive. Therefore, there are two ways recombination frequencies can be expressed: (1) targeted cells (those that are G418 and 6-TG resistant) divided by the total number of cells subjected to selection provides the overall recombination frequency; and (2) targeted cells (those that are G418 and 6-TG resistant) divided by the number of G418 resistant cells (i.e., both random and targeted recombination events) provides the targeted recombination frequency. Both methods of expression of recombination frequency appear in the literature. For example, a study using murine ES-D3 cells for HPRT gene targeting had overall recombination frequencies of 0.4×10−6 and 1.6×10−6 in two different experiments with targeted recombination frequencies of 1.3×10−5 and 5.3×10−5 respectively [41]. Whereas the targeted recombination frequency reports on the ratio of site-specific to random integration events, the overall recombination frequency reports on the total number of targeted integration events in the cell population. The invention uses both measures to evaluate the effects of the proposed modification to the targeting vector on HR. So, for example, we expect that attaching an NLS-signal peptide to the targeting vector will increase the overall recombination frequency but not necessarily site-specific targeting. On the other hand, vectors with an attached peptide bait that binds RAD51 should increase both site-specific targeting—i.e., increase the targeted recombination frequency—as well as the overall recombination frequency.
Current gene therapy using viruses has advantages, such as efficient gene delivery, but their main limitations are in the areas of safety and random integration events causing inactivation or activation of endogenous genes [6, 8, 42, 43]. The best biological approach for gene therapy therefore is HR where one either replaces the defective region of a gene with its normal counterpart or expresses a normal gene at a known locus thereby avoiding random insertional mutagenesis. Development of a method for efficient gene targeting will allow for rapid advancements in the creation of cells or cell lines containing modified genes for the purpose of studying the biological function of specific genes. Drawbacks to the use of HR in mammalian cells for gene targeting purposes are its inherent inefficiency and relatively low frequency of targeted integration.