Recombinant DNA technology is currently the most valuable tool known for producing highly pure therapeutic proteins both in vitro and in vivo to treat clinical diseases. Accordingly, a vast number of genes encoding therapeutic proteins have been identified and cloned to date, providing valuable sources of protein. The value of these genes is, however, often limited by low expression levels.
This problem has traditionally been addressed using regulatory elements, such as optimal promoters and enhancers, which increase transcription/expression levels of genes. Additional techniques, particularly those which do not rely on foreign sequences (e.g., viral or other foreign regulatory elements) for increasing transcription efficiency of cloned genes, resulting in higher expression, would be of great value.
Accordingly, the present invention provides novel methods for increasing gene expression, and novel genes which exhibit such increased expression.
Gene expression begins with the process of transcription. Factors present in the cell nucleus bind to and transcribe DNA into RNA. This RNA (known as pre-mRNA) is then processed via splicing to remove non-coding regions, referred to as introns, prior to being exported out of the cell nucleus into the cytoplasm (where they are translated into protein). Thus, once spliced, pre-mRNA becomes mRNA which is free of introns and contains only coding sequences (i.e., exons) within its translated region.
Splicing of vertebrate pre-mRNAs occurs via a two step process involving splice site selection and subsequent excision of introns. Splice site selection is governed by definition of exons (Berget et al. (1995) J. Biol. Chem. 270(6):2411-2414), and begins with recognition by splicing factors, such as small nuclear ribonucleoproteins (snRNPs), of consensus sequences located at the 3xe2x80x2 end of an intron (Green et al. (1986) Annu. Rev. Genet. 20:671-708). These sequences include a 3xe2x80x2 splice acceptor site, and associated branch and pyrimidine sequences located closely upstream of 3xe2x80x2 splice acceptor site (Langford et al. (1983) Cell 33:519-527). Once bound to the 3xe2x80x2 splice acceptor site, splicing factors search downstream through the neighboring exon for a 5xe2x80x2 splice donor site. For internal introns, if a 5xe2x80x2 splice donor site is found within about 50 to 300 nucleotides downstream of the 3xe2x80x2 splice acceptor site, then the 5xe2x80x2 splice donor site will generally be selected to define the exon (Robberson et al. (1990) Mol. Cell. Biol. 10(1):84-94), beginning the process of spliceosome assembly.
Accordingly, splicing factors which bind to 3xe2x80x2 splice acceptor and 5xe2x80x2 splice donor sites communicate across exons to define these exons as the original units of spliceosome assembly, preceding excision of introns. Typically, stable exon complexes will only form and internal introns thereafter be defined if the exon is flanked by both a 3xe2x80x2 splice acceptor site and 5xe2x80x2 splice donor site, positioned in the correct orientation and within 50 to 300 nucleotides of one another.
It has also been shown that the searching mechanism defining exons is not a strict 5xe2x80x2 to 3xe2x80x2 (i.e., downstream) scan, but instead operates to find the xe2x80x9cbest fitxe2x80x9d to consensus sequence (Robberson et al., supra. at page 92). For example, if a near-consensus 5xe2x80x2 splice donor site is located between about 50 to 300 nucleotides downstream of a 3xe2x80x2 splice acceptor site, it may still be selected to define an exon, even if it is not consensus. This may explain the variety of different splicing patterns (referred to as xe2x80x9calternative splicingxe2x80x9d) which is observed for many genes.
The present invention provides novel DNAs which exhibit increased expression of a protein of interest. The novel DNAs also can be characterized by increased levels of cytoplasmic mRNA accumulation following transcription within a cell, and by novel splicing patterns. The present invention also provides expression vectors which provide high tissue-specific expression of DNAs, and compositions for delivering such vectors to cells. The invention further provides methods of increasing gene expression and/or modifying the transcription pattern of a gene. The invention still further provides methods of producing a protein by recombinant expression of a novel DNA of the invention.
In one embodiment, a novel DNA of the invention comprises an isolated DNA (e.g., gene clone or cDNA) containing one or more consensus or near consensus splice sites (3xe2x80x2 splice acceptor or 5xe2x80x2 splice donor) which have been corrected. Such consensus or near consensus splice sites can be corrected by, for example, mutation (e.g., substitution) of at least one consensus nucleotide with a different, preferably non-consensus, nucleotide. These consensus nucleotides can be located within a consensus or near consensus splice site, or within an associated branch sequence (e.g., located upstream of a 3xe2x80x2 splice acceptor site). Preferred consensus nucleotides for correction include invariant (i.e., conserved) nucleotides, including one or both of the invariant bases (AG) present in a 3xe2x80x2 splice acceptor site; one or both of the invariant bases (GT) present in a 5xe2x80x2 splice donor site; or the invariant A present in the branch sequence of a 3xe2x80x2 splice acceptor site.
If the consensus or near consensus splice site is located within the coding region of a gene, then the correction is preferably achieved by conservative mutation. In a particularly preferred embodiment, all possible conservative mutations are made within a given consensus or near consensus splice site, so that the consensus or near consensus splice site is as far from consensus as possible (i.e., has the least homology to consensus as is possible) without changing the coding sequence of the consensus or near consensus splice site.
In another embodiment, a novel DNA of the invention comprises at least one non-naturally occurring intron, either within a coding sequence or within a 5xe2x80x2 and/or 3xe2x80x2 non-coding sequence of the DNA. Novel DNAs comprising one or more non-naturally occurring introns may further comprise one or more consensus or near consensus splice sites which have been corrected as previously summarized.
In a particular embodiment of the invention, the present invention provides a novel gene encoding a human Factor VIII protein. This novel gene comprises one or more non-naturally occurring introns which serve to increase transcription of the gene, or to alter splicing of the gene. The gene may alternatively or additionally comprise one or more consensus splice sites or near consensus splice sites which have been corrected, also to increase transcription of the gene, or to alter splicing of the gene. In one embodiment, the Factor VIII gene comprises the coding region of the full-length human Factor VIII gene, except that the coding region has been modified to contain an intron spanning, overlapping or within the region of the gene encoding the xcex2-domain. This novel gene is therefore expressed as a xcex2-domain deleted human Factor VIII protein, since all or a portion of the xcex2-domain coding sequence (defined by an intron) is spliced out during transcription.
A particular novel human Factor VIII gene of the invention comprises the nucleotide sequence shown in SEQ ID NO:1. Another particular novel human Factor VIII gene of the invention comprises the coding region of the nucleotide sequence shown in SEQ ID NO:3 (nucleotides 1006-8237). Particular novel expression vectors of the invention comprise the complete nucleotide sequences shown in SEQ ID NOS: 2, 3 and 4. These vectors include novel 5xe2x80x2 untranslated regulatory regions designed to provide high liver-specific expression of human Factor VIII protein.
In still other embodiments, the invention provides a method of increasing expression of a DNA sequence (e.g., a gene, such as a human Factor VIII gene), and a method of increasing the amount of mRNA which accumulates in the cytoplasm following transcription of a DNA sequence. In addition, the invention provides a method of altering the transcription pattern (e.g., splicing) of a DNA sequence. The methods of the present invention each involve correcting one or more consensus or near consensus splice sites within the nucleotide sequence of a DNA, and/or adding one or more non-naturally occurring introns into the nucleotide sequence of a DNA.
In a particular embodiment, the invention provides a method of simultaneously increasing expression of a gene encoding human Factor VIII protein, while also altering the gene""s splicing pattern. The method involves inserting into the coding region of the gene an intron which spans, overlaps or is contained within the portion of the gene encoding the xcex2-domain. The method may additionally or alternatively comprise correcting within either the coding sequence or the 5xe2x80x2 or 3xe2x80x2 untranslated regions of the novel Factor VIII gene, one or more consensus or near consensus splice sites.
In yet another embodiment, the invention provides a method of producing a human Factor VIII protein, such as a xcex2-domain deleted Factor VIII protein, by introducing an expression vector containing a novel human Factor VIII gene of the invention into a host cell capable of expressing the vector, under conditions appropriate for expression, and allowing for expression of the vector to occur.