1. Field of the Invention
The present invention is in the fields of molecular biology and cellular biology. The invention is directed generally to activation of gene expression or causing over-expression of a gene by recombination methods in situ. More specifically, the invention is directed to activation of endogenous genes by non-targeted integration of specialized activation vectors, which are provided by the invention, into the genome of a host cell. The invention also is directed to methods for the identification, activation, and isolation of genes that were heretofore undiscoverable, and to host cells and vectors comprising such isolated genes. The invention also is directed to isolated genes, gene products, nucleic acid molecules, and compositions comprising such genes, gene products and nucleic acid molecules, that may be used in a variety of therapeutic and diagnostic applications. Thus, by the present invention, endogenous genes, including those associated with human disease and development, may be identified, activated, and isolated without prior knowledge of the sequence, structure, function, or expression profile of the genes.
2. Related Art
Identification and over-expression of novel genes associated with human disease is an important step towards developing new therapeutic drugs. Current approaches to creating libraries of cells for protein over-expression are based on the production and cloning of cDNA. Thus, in order to identify a new gene using this approach, the gene must be expressed in the cells that were used to make the library. The gene also must be expressed at sufficient levels to be adequately represented in the library. This is problematic because many genes are expressed only in very low quantities, in a rare population of cells, or during short developmental periods.
Furthermore, because of the large size of some mRNAs, it is difficult or impossible to produce full length cDNA molecules capable of expressing the biologically active protein. Lack of full-length cDNA molecules has also been observed for small mRNAs and is thought to be related to sequences in the message that are difficult to produce by reverse transcription or that are unstable during propagation in bacteria. As a result, even the most complete cDNA libraries express only a fraction of the entire set of possible genes.
Finally, many cDNA libraries are produced in bacterial vectors. Use of these vectors to express biologically active mammalian proteins is severely limited since most mammalian proteins do not fold correctly and/or are improperly glycosylated in bacteria.
Therefore, a method for creating a more representative library for protein expression, capable of facilitating faithful expression of biologically active proteins, would be extremely valuable.
Current methods for over-expressing proteins involve cloning the gene of interest and placing it, in a construct, next to a suitable promoter/enhancer, polyadenylation signal, and splice site, and introducing the construct into an appropriate host cell.
An alternative approach involves the use of homologous recombination to activate gene expression by targeting a strong promoter or other regulatory sequence to a previously identified gene.
WO 90/14092 describes in situ modification of genes, in mammalian cells, encoding proteins of interest. This application describes single-stranded oligonucleotides for site-directed modification of genes encoding proteins of interest. A marker may also be included. However, the methods are limited to providing an oligonucleotides sequence substantially homologous to a target site. Thus, the method requires knowledge of the site required for activation by site-directed modification and homologous recombination. Novel genes are not discoverable by such methods.
WO 91/06667 describes methods for expressing a mammalian gene in situ. With this method, an amplifiable gene is introduced next to a target gene by homologous recombination. When the cell is then grown in the appropriate medium, both the amplifiable gene and the target gene are amplified and there is enhanced expression of the target gene. As above, methods of introducing the amplifiable gene are limited to homologous recombination, and are not useful for activating novel genes whose sequence (or existence) is unknown.
WO 91/01140 describes the inactivation of endogenous genes by modification of cells by homologous recombination. By these methods, homologous recombination is used to modify and inactivate genes and to produce cells which can serve as donors in gene therapy.
WO 92/20808 describes methods for modifying genomic target sites in situ. The modifications are described as being small, for example, changing single bases in DNA. The method relies upon genomic modification using homologous DNA for targeting.
WO 92/19255 describes a method for enhancing the expression of a target gene, achieved by homologous recombination in which a DNA sequence is integrated into the genome or large genomic fragment. This modified sequence can then be transferred to a secondary host for expression. An amplifiable gene can be integrated next to the target gene so that the target region can be amplified for enhanced expression. Homologous recombination is necessary to this targeted approach.
WO 93/09222 describes methods of making proteins by activating an endogenous gene encoding a desired product. A regulatory region is targeted by homologous recombination and replacing or disabling the region normally associated with the gene whose expression is desired. This disabling or replacement causes the gene to be expressed at levels higher than normal.
WO 94/12650 describes a method for activating expression of and amplifying an endogenous gene in situ in a cell, which gene is not expressed or is not expressed at desired levels in the cell. The cell is transfected with exogenous DNA sequences which repair, alter, delete, or replace a sequence present in the cell or which are regulatory sequences not normally functionally linked to the endogenous gene in the cell. In order to do this, DNA sequences homologous to genomic DNA sequences at a preselected site are used to target the endogenous gene. In addition, amplifiable DNA encoding a selectable marker can be included. By culturing the homologously recombinant cells under conditions that select for amplification, both the endogenous gene and the amplifiable marker are co-amplified and expression of the gene increased.
WO 95/31560 describes DNA constructs for homologous recombination. The constructs include a targeting sequence, a regulatory sequence, an exon, and an unpaired splice donor site. The targeting is achieved by homologous recombination of the construct with genomic sequences in the cell and allows the production of a protein in vitro or in vivo.
WO 96/29411 describes methods using an exogenous regulatory sequence, an exogenous exon, either coding or non-coding, and a splice donor site introduced into a preselected site in the genome by homologous recombination. In this application, the introduced DNA is positioned so that the transcripts under control of the exogenous regulatory region include both the exogenous exon and endogenous exons present in either the thrombopoietin, DNase I, or xcex2-interferon genes, resulting in transcripts in which the exogenous and exogenous exons are operably linked. The novel transcription units are produced by homologous recombination.
U.S. Pat. No. 5,272,071 describes the transcriptional activation of transcriptionally silent genes in a cell by inserting a DNA regulatory element capable of promoting the expression of a gene normally expressed in that cell. The regulatory element is inserted so that it is operably linked to the normally silent gene. The insertion is accomplished by means of homologous recombination by creating a DNA construct with a segment of the normally silent gene (the target DNA) and the DNA regulatory element used to induce the desired transcription.
U.S. Pat. No. 5,578,461 discusses activating expression of mammalian target genes by homologous recombination. A DNA sequence is integrated into the genome or a large genomic fragment to enhance the expression of the target gene. The modified construct can then be transferred to a secondary host. An amplifiable gene can be integrated adjacent to the target gene so that the target region is amplified for enhanced expression.
Both of the above approaches (construction of an over-expressing construct by cloning or by homologous recombination in vivo) require the gene to be cloned and sequenced before it can be over-expressed. Furthermore, using homologous recombination, the genomic sequence and structure must also be known.
Unfortunately, many genes have not yet been identified and/or sequenced. Thus, a method for over-expressing a gene of interest, whether or not it has been previously cloned, and whether or not its sequence and structure are known, would be useful.
The invention is, therefore, generally directed to methods for over-expressing an endogenous gene in a cell, comprising introducing a vector containing a transcriptional regulatory sequence into the cell, allowing the vector to integrate into the genome of the cell by non-homologous recombination, and allowing over-expression of the endogenous gene in the cell. The method does not require previous knowledge of the sequence of the endogenous gene or even of the existence of the gene. Hence, the invention is directed to non-targeted gene activation, which as used herein means the activation of endogenous genes by non-targeted or non-homologous (as opposed to targeted or homologous) integration of specialized activation vectors into the genome of a host cell.
The invention also encompasses novel vector constructs for activating gene expression or over-expressing a gene through non-homologous recombination. The novel construct lacks homologous targeting sequences. That is, it does not contain nucleotide sequences that target host cell DNA and promote homologous recombination at the target site, causing over-expressing of a cellular gene via the introduced transcriptional regulatory sequence.
Novel vector constructs include a vector containing a transcriptional regulatory sequence operably linked to an unpaired splice donor sequence and further contains one or more amplifiable markers.
Novel vector constructs include constructs with a transcriptional regulatory sequence operably linked to a translational start codon, a signal secretion sequence, and an unpaired splice donor site; constructs with a transcriptional regulatory sequence, operably linked to a translation start codon, an epitope tag, and an unpaired splice donor site; constructs containing a transcriptional regulatory sequence operably linked to a translational start codon, a signal sequence and an epitope tag, and an unpaired splice donor site; constructs containing a transcriptional regulatory sequence operably linked to a translation start codon, a signal secretion sequence, an epitope tag, and a sequence-specific protease site, and an unpaired splice donor site.
The vector construct can contain one or more selectable markers for recombinant host cell selection. Alternatively, selection can be effected by phenotypic selection for a trait provided by the activated endogenous gene product.
These vectors, and indeed any of the vectors disclosed herein, and variants of the vectors that will be readily recognized by one of ordinary skill in the art, can be used in any of the methods described herein to form any of the compositions producible by these methods.
The transcriptional regulatory sequence used in the vector constructs of the invention includes, but is not limited to, a promoter. In preferred embodiments, the promoter is a viral promoter. In highly preferred embodiments, the viral promoter is the cytomegalovirus immediate early promoter. In alternative embodiments, the promoter is a cellular, non-viral promoter or inducible promoter.
The transcriptional regulatory sequence used in the vector construct of the invention may also include, but is not limited to, an enhancer. In preferred embodiments, the enhancer is a viral enhancer. In highly preferred embodiments, the viral enhancer is the cytomegalovirus immediate early enhancer. In alternative embodiments, the enhancer is a cellular non-viral enhancer.
In preferred embodiments of the methods described herein, the vector construct be, or may contain, linear RNA or DNA.
The cell containing the vector may be screened for expression of the gene.
The cell over-expressing the gene can be cultured in vitro under conditions favoring the production, by the cell, of desired amounts of the gene product (also referred to interchangeably herein as the xe2x80x9cexpression productxe2x80x9d) of the endogenous gene that has been activated or whose expression has been increased. The expression product can then be isolated and purified to use, for example, in protein therapy or drug discovery.
Alternatively, the cell expressing the desired gene product can be allowed to express the gene product in vivo. In certain such aspects of the invention, the cell containing a vector construct of the invention integrated into its genome may be introduced into a eukaryote (such as a vertebrate, particularly a mammal, more particularly a human) under conditions favoring the overexpression or activation of the gene by the cell in vivo in the eukaryote. In related such aspects of the invention, the cell may be isolated and cloned prior to being introduced into the eukaryote.
The invention is also directed to methods for over-expressing an endogenous gene in a cell, comprising introducing a vector containing a transcriptional regulatory sequence and one or more amplifiable markers into the cell, allowing the vector to integrate into the genome of the cell by non-homologous recombination, and allowing over-expression of the endogenous gene in the cell.
The cell containing the vector may be screened for over-expression of the gene.
The cell over-expressing the gene is cultured such that amplification of the endogenous gene is obtained. The cell can then be cultured in vitro so as to produce desired amounts of the gene product of the amplified endogenous gene that has been activated or whose expression has been increased. The gene product can then be isolated and purified.
Alternatively, following amplification, the cell can be allowed to express the endogenous gene and produce desired amounts of the gene product in vivo.
It is to be understood, however, that any vector used in the methods described herein can include one or more amplifiable markers. Thereby, amplification of both the vector and the DNA of interest (i.e., containing the over-expressed gene) occurs in the cell, and further enhanced expression of the endogenous gene is obtained. Accordingly, methods can include a step in which the endogenous gene is amplified.
The invention is also directed to methods for over-expressing an endogenous gene in a cell comprising introducing a vector containing a transcriptional regulatory sequence and an unpaired splice donor sequence into the cell, allowing the vector to integrate into the genome of the cell by non-homologous recombination, and allowing over-expression of the endogenous gene in the cell.
The cell containing the vector may be screened for expression of the gene.
The cell over-expressing the gene can be cultured in vitro so as to produce desirable amounts of the gene product of the endogenous gene whose expression has been activated or increased. The gene product can then be isolated and purified.
Alternatively, the cell can be allowed to express the desired gene product in vivo.
The vector construct can consist essentially of the transcriptional regulatory sequence.
The vector construct can consist essentially of the transcriptional regulatory sequence and one or more amplifiable markers.
The vector construct can consist essentially of the transcriptional regulatory sequence and the splice donor sequence.
Any of the vector constructs of the invention can also include a secretion signal sequence. The secretion signal sequence is arranged in the construct so that it will be operably linked to the activated endogenous protein. Thereby, secretion of the protein of interest occurs in the cell, and purification of that protein is facilitated. Accordingly, methods can include a step in which the protein expression product is secreted from the cell.
The invention also encompasses cells made by any of the above methods. The invention encompasses cells containing the vector constructs, cells in which the vector constructs have integrated into the cellular genome, and cells which are over-expressing desired gene products from an endogenous gene, over-expression being driven by the introduced transcriptional regulatory sequence
The cells can be isolated and cloned.
The methods can be carried out in any cell of eukaryotic origin, such as fungal, plant or animal. In preferred embodiments, the methods of the invention may be carried out in vertebrate cells, and particularly mammalian cells including but not limited to rat, mouse, bovine, porcine, sheep, goat and human cells, and more particularly in human cells.
A single cell made by the methods described above can over-express a single gene or more than one gene. More than one gene in a cell can be activated by the integration of a single type of construct into multiple locations in the genome. Similarly, more than one gene in a cell can be activated by the integration of multiple constructs (i.e., more than one type of construct) into multiple locations in the genome. Therefore, a cell can contain only one type of vector construct or different types of constructs, each capable of activating an endogenous gene.
The invention is also directed to methods for making the cells described above by one or more of the following: introducing one or more of the vector constructs of the invention into a cell; allowing the introduced construct(s) to integrate into the genome of the cell by non-homologous recombination; allowing over-expression of one or more endogenous genes in the cell; and isolating and cloning the cell. The invention is also directed to cells produced by such methods, which may be isolated cells.
The invention also encompasses methods for using the cells described above to over-express a gene, such as an endogenous cellular gene, that has been characterized (for example, sequenced), uncharacterized (for example, a gene whose function is known but which has not been cloned or sequenced), or a gene whose existence was, prior to over-expression, unknown. The cells can be used to produce desired amounts of an expression product in vitro or in vivo. If desired, this expression product can then be isolated and purified, for example by cell lysis or by isolation from the growth medium (as when the vector contains a secretion signal sequence).
The invention also encompasses libraries of cells made by the above described methods. A library can encompass all of the clones from a single transfection experiment or a subset of clones from a single transfection experiment. The subset can over-express the same gene or more than one gene, for example, a class of genes. The transfection can have been done with a single construct or with more than one construct.
A library can also be formed by combining all of the recombinant cells from two or more transfection experiments, by combining one or more subsets of cells from a single transfection experiment or by combining subsets of cells from separate transfection experiments. The resulting library can express the same gene, or more than one gene, for example, a class of genes. Again, in each of these individual transfections, a unique construct or more than one construct can be used.
Libraries can be formed from the same cell type or different cell types.
The invention is also directed to methods for making libraries by selecting various subsets of cells from the same or different transfection experiments.
The invention is also directed to methods of using the above-described cells or libraries of cells to over-express or activate endogenous genes, or to obtain the gene expression products of such over-expressed or activated genes. According to this aspect of the invention, the cell or library may be screened for the expression of the gene and cells that express the desired gene product may be selected. The cell can then be used to isolate or purify the gene product for subsequent use. Expression in the cell can occur by culturing the cell in vitro, under conditions favoring the production of the expression product of the endogenous gene by the cell, or by allowing the cell to express the gene in vivo.
In preferred embodiments of the invention, the methods include a process wherein the expression product is isolated or purified. In highly preferred embodiments, the cells expressing the endogenous gene product are cultured under conditions favoring production of sufficient amounts of gene product for commercial application, and especially for diagnostic, therapeutic and drug discovery uses.
Any of the methods can further comprise introducing double-strand breaks into the genomic DNA in the cell prior to or simultaneously with vector integration.
The invention also is directed to vector constructs that are useful for activating expression of endogenous genes and for isolating the mRNA and cDNA corresponding to the activated genes.
In one such embodiment, the vector construct may comprise (a) a first transcriptional regulatory sequence operably linked to a first unpaired splice donor sequence; (b) a second transcriptional regulatory sequence operably linked to a second unpaired splice donor sequence; and (c) a linearization site, which may be located between the first and second transcriptional regulatory sequences. According to the invention, when the vector construct is transformed into a host cell and then integrates into the genome of the host cell, the first transcriptional regulatory sequence is preferably in an inverted orientation relative to the orientation of the second transcriptional regulatory sequence. In certain preferred such embodiments, the vector may be rendered linear by cleavage at the linearization site.
In another embodiment, the invention provides a linear vector construct having a 3xe2x80x2 end and a 5xe2x80x2 end, comprising a transcriptional regulatory sequence operably linked to an unpaired spliced donor site, wherein the transcriptional regulatory sequence is oriented in the linear vector construct in an orientation that directs transcription towards the 3xe2x80x2 end or the 5xe2x80x2 end of the linear vector construct.
In another embodiment, the invention provides a vector construct comprising, in sequential order, (a) a transcriptional regulatory sequence, (b) an unpaired splice donor site, (c) a rare cutting restriction site, and (d) a linearization site.
In another embodiment, the invention provides a vector construct comprising (a) a first transcriptional regulatory sequence operably linked to a selectable marker lacking a polyadenylation signal; and (b) a second transcriptional regulatory sequence operably linked to an exon-splice donor site complex, wherein the first transcriptional regulatory sequence is in the same orientation in the vector construct as is the second transcriptional regulatory sequence, and wherein the first transcriptional regulatory sequence is upstream of the second transcriptional regulatory sequence in the vector construct.
In additional embodiments, the invention provides vector constructs comprising a transcriptional regulatory sequence operably linked to a selectable marker lacking a polyadenylation signal, and further comprising an unpaired splice donor site.
In another embodiment, the invention provides vector constructs comprising a first transcriptional regulatory sequence operably linked to a selectable marker lacking a polyadenylation signal, and further comprising a second transcriptional regulatory sequence operably linked to an unpaired splice donor site.
According to the invention, the transcriptional regulatory sequence (or first or second transcriptional regulatory sequence, in vector constructs having more than one transcriptional regulatory sequence) may be a promoter, an enhancer, or a repressor, and is preferably a promoter, including an animal cell promoter, a plant cell promoter, or a fungal cell promoter, most preferably a promoter selected from the group consisting of a CMV immediate early gene promoter, an SV40 T antigen promoter, and a xcex2-actin promoter. Other promoters of animal, plant, or fungal cell origin that may be used in accordance with the invention are known in the art and will be familiar to one of ordinary skill in view of the teachings herein.
The selectable marker used in the vector constructs of the invention may be any marker or marker gene that, upon integration of a vector containing the selectable marker into the host cell genome, permits the selection of a cell containing or expressing the marker gene. Suitable such selectable markers include, but are not limited to, a neomycin gene, a hypoxanthine phosphribosyl transferase gene, a puromycin gene, a dihydrooratase gene, a glutamine synthetase gene, a histidine D gene, a carbamyl phosphate synthase gene, a dihydrofolate reductase gene, a multidrug resistance 1 gene, an aspartate transcarbamylase gene, axanthine-guanine phosphoribosyl transferase gene, an adenosine deaminase gene, and a thymidine kinase gene.
In related embodiments, the invention provides vector constructs comprising a positive selectable marker, a negative selectable marker, and an unpaired splice donor site, wherein the positive and negative selectable markers and the splice donor site are oriented in the vector construct in an orientation that results in expression of the positive selectable marker in active form, and either non-expression of said negative selectable marker or expression of the negative selectable marker in inactive form, when the vector construct is integrated into the genome of a eukaryotic host cell and activates an endogenous gene in the genome. In certain preferred such embodiments, either the positive selection marker, the negative selection marker, or both, may lack a polyadenylation signal. The positive selection marker used in these aspects of the invention may be any selection marker that, upon expression, produces a protein capable of facilitating the isolation of cells expressing the marker, including but not limited to a neomycin gene, a hypoxanthine phosphribosyl transferase gene, a puromycin gene, a dihydrooratase gene, a glutamine synthetase gene, a histidine D gene, a carbamyl phosphate synthase gene, a dihydrofolate reductase gene, a multidrug resistance I gene, an aspartate transcarbamylase gene, a xanthine-guanine phosphoribosyl transferase gene, or an adenosine deaminase gene. Analogously, the negative selection marker used in these aspects of the invention may be any selection marker that, upon expression, produces a protein capable of facilitating removal of cells expressing the marker, including but not limited to a hypoxanthine phosphribosyl transferase gene, a thymidine kinase gene, or a diphtheria toxin gene.
The invention also is directed to eukaryotic host cells, which may be isolated host cells, comprising one or more of the vector constructs of the invention. Preferred such eukaryotic host cells include, but are not limited to, animal cells (including, but not limited to, mammalian (particularly human) cells, insect cells, avian cells, annelid cells, amphibian cells, reptilian cells, and fish cells), plant cells, and fungal (particularly yeast) cells. In certain such host cells, the vector construct may be integrated into the genome of the host cell.
The invention also is directed to primer molecules comprising a PCR-amplifiable sequence and a degenerate 3xe2x80x2 terminus. Primer molecules according to this aspect of the invention preferably have the general structure:
5xe2x80x2-(dT)axe2x80x94Xxe2x80x94Nbxe2x80x94TTTATT-3xe2x80x2,
wherein a is a whole number from 1 to 100 (preferably from 10 to 30), X is a PCR-amplifiable sequence consisting of a nucleic acid sequence of about 10-20 nucleotides in length, N is any nucleotide, and b is a whole number from 0 to 6. One preferred such primer has the nucleotide sequence 5xe2x80x2-TTTTTTTTTTTTCGTCAGCGGCCGCATCNNNNTTTATT-3xe2x80x2 (SEQ ID NO:10). In related embodiments, the primer molecules according to this aspect of the invention may be biotinylated.
The invention also is directed to methods for first strand cDNA synthesis comprising (a) annealing a first primer of the invention (such as the primer described above) to an RNA template molecule to form an first primer-RNA complex, and (b) treating this first primer-RNA complex with reverse transcriptase and one or more deoxynucleoside triphosphate molecules under conditions favoring the reverse transcription of the first primer-RNA complex to synthesize a first strand cDNA.
The invention also is directed to methods for isolating activated genes, particularly from a host cell genome. These methods of the invention exploit the structure of the mRNA molecules produced using the non-targeted gene activation vectors of the invention. One such method of the invention comprises, for example, (a) introducing a vector construct comprising a transcriptional regulatory sequence and an unpaired splice donor site into a host cell (preferably one of the eukaryotic host cells described above), (b) allowing the vector construct to integrate into the genome of the host cell by non-homologous recombination, under conditions such that the vector activates an endogenous gene comprising an exon in the genome, (c) isolating RNA from the host cell, (d) synthesizing first strand cDNA according to the method of the invention described above, (e) annealing a second primer specific for the vector-encoded exon to the first strand cDNA to create a second primer-first strand cDNA complex, and (f) contacting the second primer-first strand cDNA complex with a DNA polymerase under conditions favoring the production of a second strand cDNA substantially complementary to the first strand cDNA. Methods according to this aspect of the invention may comprise one or more additional steps, such as treating the second strand cDNA with a restriction enzyme that cleaves at a restriction site located on the vector downstream of the unpaired splice donor site, or amplifying the second strand cDNA using a third primer specific for the vector-encoded exon and a fourth primer specific for the second primer. The invention also is directed to isolated genes produced according to these methods, and to vectors (which may be expression vectors) and host cells comprising these isolated genes. The invention also is directed to methods of producing a polypeptide, comprising cultivating a host cell comprising the isolated gene (or a vector, particularly an expression vector, comprising the isolated gene), and culturing the host cell under conditions favoring the expression by the host cell of a polypeptide encoded by the isolated gene. The invention also provides additional methods of producing a polypeptide, comprising introducing into a host cell a vector comprising a transcriptional regulatory sequence operably linked to an exonic region followed by an unpaired splice donor site, and culturing the host cell under conditions favoring the expression by said host cell of a polypeptide encoded by the exonic region, wherein the exon contains a translational start site positioned at any of the open reading frame positions relative to the 5xe2x80x2-most base of the unpaired splice donor site (e.g, the xe2x80x9cAxe2x80x9d in the ATG start codon may be at position -3 or at an increment of 3 bases upstream therefrom (e.g., -6, -9, -12, -15, -18, etc.), at position -2 or at an increment of 3 bases upstream therefrom (e.g., -5, -8, -11, -14, -17, -20, etc.), or at position -1 or at an increment of 3 bases upstream therefrom (e.g., -4, -7, -10, -13, -16, -19, etc.), relative to the 5xe2x80x2-most base of the splice donor site). In related embodiments, the methods of the invention may further comprise isolating the polypeptide. The invention also is directed to polypeptides, which may or may not be isolated polypeptides, produced according to these methods.
Other preferred embodiments of the present invention will be apparent to one of ordinary skill in light of the following drawings and description of the invention, and of the claims.