The present invention relates to an insect cell vector for the production of proteins from the DEAD protein family.
The modulation of the RNA structure plays an essential role in cellular processes, such as, for example, in pre-mRNA splicing, in RNA transport or in protein translation, as the cellular RNA is present in the cell in different secondary and tertiary structures and, in addition, a large number of RNA-binding proteins provides for further structuring of the RNA. Proteins from the family of the so-called DEAD box proteins, inter alia, are involved in these modulation processes. The members of this protein superfamily, which as a characteristic contain a number of homologous protein sequences, so-called xe2x80x9cprotein boxesxe2x80x9d, are named after the highly conserved tetrapeptide Asp-Glu-Ala-Asp (residues 21-24 of SEQ ID NO: 23) in the single-letter code D-E-A-D, as a motif. This protein superfamily also includes a number of RNA and DNA helicases.
The characteristic protein sequences of the DEAD proteins are highly conserved in evolution. A schematic representation of the proteins from the DEAD superfamily and its subfamilies as in FIG. 1 shows the similarity between the individual family members (see also Schmid, S. R. and Linder, P. (1992) Molecular Microbiology, 6, 283, No. 3; Fuller-Pace F. V. (1994) Trends in Cell Biology, 4, 271). It is recognized that the DEAD superfamily is divided into various subfamilies, which according to their sequence motif are called DEAH, DEXH or DEAH* subfamily. All family members have an ATP-binding and RNA-binding function and also an ATP hydrolysis and RNA helicase function.
EP-A-0778347 now describes a novel ATP- and nucleic acid-binding protein having putative helicase and ATPase properties, which is assigned to the DEAH subfamily. In addition to the properties mentioned, the RNA helicase described is also connected with the tolerance of certain cells to leflunomide and related compounds and is thus suitable for the production of cell lines which are helpful in cancer, inflammation and apoptosis research and also in the elucidation of mechanisms of action of pharmaceuticals. A further possibility of application of this helicase is the identification of already known substances with respect to possible pharmaceutical properties such as, for example, an anticarcinogenic or antiviral action in a test or assay system. Sufficient amounts of the protein, however, are necessary for the desired types of use of the RNA helicase.
Interestingly, it has not been possible to date to homologously or heterologously express proteins from the DEAD protein superfamily in adequate amounts functionally. In addition to the size of the proteins, many representatives of this family have a molecular mass of 100 kD and above, certain structural motifs appear to inhibit the expression in foreign organisms. In particular, it is suspected of the so-called RS domain, a region of between 50 and 200 amino acids in size, which exhibits a greatly increased number of argenine-serine repetitions (single-letter code RS), that it directly or indirectly complicates protein expression. A direct effect can be caused, for example, by incorrect phosphorylation of the serine residues in this region. Indirectly, overexpression of proteins with this domain can cause toxic effects in the cell, as specific protein-protein interactions are mediated via this protein domain. In the case of heterologous protein overexpression, the native interaction can thus be disturbed or inhibited via RS domains.
The family of RS proteins is a xe2x80x9csubfamilyxe2x80x9d of proteins which is defined by the possession of the RS domain. These proteins are involved in the most different of processes of pre-mRNA splicing. RS domains can mediate protein-protein interactions, influence RNA binding, modulate RNA-RNA annealing and function as subcellular location signals. The relationship between the DEAD box and the RS proteins consists in the fact that both are involved in the modulation of RNA structure and function and therefore many proteins are to be assigned to the protein families.
The RS domain in human RNA helicase according to SEQ ID No. 7 is in the range from about 131 to about 253 and in particular in the range from about 175 to about 216 based on the amino acid position.
It was therefore the object of the present invention to make available a process which makes possible the production by genetic engineering of proteins from the DEAD protein superfamily in large amounts.
It has now surprisingly been found that, in contrast to expression in E. coli or yeast, expression in insect cells is possible in an advantageous manner.
One subject of the present invention is therefore an insect cell vector comprising a nucleic acid coding for a protein from the DEAD protein superfamily. The term xe2x80x9cnucleic acidxe2x80x9d is understood according to the present invention as meaning preferably single- or double-stranded DNA or RNA, in particular double-stranded DNA.
In a preferred embodiment, the coding nucleic acid at the 3xe2x80x2 end of the coding region additionally contains a native 3xe2x80x2-noncoding region, which in preferred embodiments is at least about 50, preferably about 50 to about 450, in particular about 50 to about 400, nucleotides long.
xe2x80x9cNativexe2x80x9d within the meaning of the present invention designates 3xe2x80x2-noncoding nucleic acid regions which originates from the same organism, preferably from the same gene as the coding nucleic acid. If, for example, the nucleic acid codes for a human RNA helicase according to EP-A-0778347, the 3xe2x80x2-noncoding region according to the preferred embodiment likewise originates from human cells, in particular from the gene coding for the designated RNA helicase. The 3xe2x80x2-noncoding region according to SEQ ID No. 10 is preferred.
It is known that the 3xe2x80x2-noncoding region of genes can bind various regulatory proteins or factors. Thus, for example, the so-called xe2x80x9cCleavage and Polyadenylation Specificity Factorxe2x80x9d (CPSF) binds to the noncoding RNA sequence AAUAAA. The CPSF protein consists of a complex of subunits with molecular weights of 160, 100, 73 and 30 kD. A further RNA binding protein is the so-called xe2x80x9cCleavage Stimulation Factorxe2x80x9d (CstF). This protein is a heterotrimer of three subunits of 77, 64 and 50 kD. In addition, there are further RNA binding proteins such as the so-called xe2x80x9cCleavage Factorsxe2x80x9d CF I and CF II and also a poly(A) polymerase. The poly(A) polymerase is a polypeptide with a molecular mass of 83 kD. The polymerase is involved both in the poly(A) tail synthesis and in its cleavage. The extension of the poly(A) tail is strongly stimulated by the so-called xe2x80x9cpoly(A) binding protein IIxe2x80x9d (PABII). Further information and literature references are found in Wahle, E. (1995) Biochemica at Biophysica Acta, 1261, 183. Thus in addition to the AAUAAA binding sequence, Wahle, E. (1995), for example, also describes further consensus motifs such as a GU-rich region having the proposed consensus sequence YGUGUUYY and U-rich elements (see also Proudfoot, N. (1991) Cell, 64, 671-674).
In a preferred embodiment, the present invention therefore relates to 3xe2x80x2-noncoding regions which contains a binding site for the CPSF protein, the CstF protein, the CF I protein, the CF II protein, the poly(A) polymerase and/or the poly(A)-binding protein II (PABII), such as, for example, an AATAAA binding site, ATTAAA binding site, a GT-rich element, in particular a YGTGTTYY element, and/or a T-rich element designated in the form of its cDNA form.
A protein from the DEAD protein superfamily is understood according to the present invention as meaning proteins which have conserved motifs, under which a conserved motif contains the amino acid sequence DEAD, DEAH or DEXH. The proteins preferably contain sequence motifs which are responsible for a nucleic acid-binding activity, a helicase activity and/or an ATPase activity. The proteins in particular contain an RNA helicase and ATPase activity. FIG. 1 and FIG. 2 shows examples of the conserved motifs for the DEAD protein superfamily and the DEAH, DEXH or DEAH* subfamilies.
Within the meaning of the present invention, the term xe2x80x9cDEAD protein superfamilyxe2x80x9d thus includes all proteins which fall within a group according to FIG. 1 or 2. Examples of proteins of this type are described in Fuller-Pace, F. V. (1994), supra, and Schmid, S. R. and Linder, P. (1992), supra. Further preferred proteins are those which impart to cells tolerance to isoxazole derivatives, such as, for example, leflunomide, and compounds related in action, such as, for example, brequinar. Human proteins are particularly preferred, in particular those from Table 1 and the RNA helicase from EP-A-0778347. Within the meaning of the present invention, proteins with a molecular mass of about 100 to about 150 kD, in particular with a molecular mass of about 130 kD, and those with a so-called SR domain, i.e. a region of about 50 to 200 amino acids in size with an increased number of arginine-serine repetitions, are preferably suitable. Within the meaning of the present invention, a nucleic acid coding for the human RNA helicase p135 according to EP 0778347 with the amino acid sequence as in FIG. 3 is particularly suitable.
A further preferred example of a nucleic acid coding for a protein from the DEAH protein subfamily with a native 3xe2x80x2-noncoding region is the cDNA of the human RNA helicase from EP-A-0778347 as in FIG. 5 of the present invention. The 3xe2x80x2-noncoding region of the RNA helicase mentioned according to SEQ ID No. 10 is generally suitable within the meaning of the present invention as a native 3xe2x80x2-noncoding region of human proteins from the DEAD protein superfamily and in particular from the DEAH protein subfamily.
In a preferred embodiment, the vector according to the invention contains regulatory sequences which control the expression of the nucleic acid coding for a protein from the DEAD protein superfamily. All the regulatory sequences known to the person skilled in the art are suitable for this. A promoter of a xe2x80x9clong terminal repeatxe2x80x9d (LTR), in particular of a retroviral LTR or of an LTR of a transposable element according to, for example, U.S. Pat. No. 5,004,687 are particularly suitable. Regulatory sequences from insect viruses, preferably baculoviruses, in particular the promoter of the polyhedrin gene or of the 10K protein (see, for example, EP-B1-0127839) are particularly suitable. In a further preferred embodiment, the native ATG start codon of the nucleic acid coding for a protein from DEAD protein superfamily is replaced by a polyhedrin-ATG translation initiation start site. The nucleic acid according to the present invention is thus a chimeric nucleic acid from insect virus sequences at the 5xe2x80x2 end and heterologous sequences following downstream, the 3xe2x80x2-noncoding part preferably containing sequences native to the heterologous part. This construct according to the invention makes possible a further advantageous increase in expression in insect cells.
In another preferred embodiment, the nucleic acid according to the invention contains a nucleic acid coding for an oligopeptide of at least about 4, preferably of about 6, histidines between the ATG translation initiation start site and the region coding for the protein from the DEAD protein superfamily. After expression of the designated nucleic acid, a fusion protein is obtained from the chosen protein from the DEAD protein superfamily and an N-terminally fused peptide which contains the histidines mentioned. By this means, the protein can be purified in a particularly simple and effective manner, for example, by means of a metal ion-containing chromatography column, such as, for example, a nickel-containing chromatography column, such as Ni-NTA resin-containing chromatography column. xe2x80x9cNTAxe2x80x9d stands for the chelator xe2x80x9cnitrilotriacetic acidxe2x80x9d (Qiagen GmbH, Hilden). Instead of or in addition to the nucleic acid coding for the histidines mentioned, a nucleic acid can also be used which codes for the glutathione S-transferase (Smith, D. B. and Johnson, K. S. 1988) Gene, 67, 31-40). The fusion proteins thus obtained can likewise be purified in a simple manner by means of affinity chromatography and detected by means of a calorimetric test or by means of an immunoassay. A suitable system is, for example, the vector PGEX from Pharmacia, Freiburg as a starting vector.
For the removal of the foreign protein component from the fusion protein mentioned, it is advantageous if the nucleic acid codes for a protease cleavage site. Suitable proteases are, for example, thrombin, or factor Xa. The thrombin cleavage site contains for example, the amino acid sequence Leu-Val-Pro-Arg-Gly-Ser (SEQ ID NO: 1) (see, for example, FIG. 3B). The factor Xa cleavage site contains, for example, the amino acid sequence lle-Glu-Gly-Arg (SEQ ID NO: 2).
A preferred 5xe2x80x2 region of the nucleic acid according to the present invention is, for example, a nucleic acid according to FIG. 3B, which begins in the start codon ATG and ends after the thrombin cleavage site at one of the designated restriction enzyme cleavage sites. The nucleic acid concerned can then be ligated according to generally known processes at the selected restriction enzyme cleavage site. A suitable nucleic acid according to the invention is a nucleic acid comprising the polyhedrin promoter, e.g. according to EP-B1-0 127 839, the nucleic acid p135-NT5C according to SEQ ID No. 12 comprising the polyhedrin-ATG translation initiation start site and a sequence coding for 6 histidines and a nucleic acid according to SEQ ID No. 9 comprising a nucleic acid coding for the RNA helicase p135 and its native 3xe2x80x2-noncoding region.
In a further preferred embodiment, the 5xe2x80x2 region of the nucleic acid according to the invention contains a nucleic acid which codes for a signal sequence, for example an insulin signal sequence, e.g. according to SEQ ID No. 13, in the form of the construct p135-NT5S. This construct also has the advantage that the desired protein can be worked up and purified particularly easily, as on account of the signal sequence it is secreted directly into the culture medium and in the course of this the signal sequence is removed, instead of accumulating the desired protein intracellularly in the insect cells. Further suitable signal sequences are the signal sequence of bombyxin of the silkworm (Congote, L. F. and Li, Q., (1994) Biochem. J., 299, 101-107), signal sequence of the human placental alkaline phosphatase (Mroczkowski, B. S. et al., (1994), J. Biol. Chem., 269, 13522-28), signal sequence of melittin from the honeybee (Mroczkowski, B. S. et al. (1994) J. Biol. Chem., 269, 13522-28; Chai, H. et al. (1993) Biotechnol. Appl. Biochem. (1993) 18, 259-73), signal sequence of the human plasminogen activator (Jarvis, D. L. and Summers, M. D. (1989) Mol. Cell. Biol., 9, 214-23), signal sequences of certain insect cell proteins (WO90/05783) or leader sequences of prokaryotic genes (EP-A1-0 486 170).
Another subject of the present invention is a process for the production of recombinant insect viruses which code for a protein from the DEAD protein superfamily according to the present invention, in which a vector according to the invention is introduced into insect cells together with insect virus wild-type DNA and the resulting recombinant insect viruses are isolated.
A suitable insect virus is, for example, the baculovirus, in particular the Autographa Californica virus. Suitable insect cells are, for example, Spodoptera Frugiperda, Trichoplusia ni, Rachiplusia ou or Galleria Mellonela. The Autographa Californica strains E2, R9, S1 or S3, especially the Autographa Californica strain S3, Spodoptera Frugiperda strain 21 or Trichoplusia ni egg cells are particularly suitable. In addition to the insect cells, ovarian cells of the corresponding insects or their larvae are also suitable. The recombinant insect virus according to the invention is formed in the insect cells by homologous recombination of the vector according to the invention with the insect virus wild type concerned (see, for example, EP-B1-0127839 or U.S. Pat. No. 5,004,687). The recombinant insect virus can then be used for the production of the desired protein.
A further subject of the present invention therefore also relates to a process for the production of a protein from the DEAD protein superfamily, in which a vector according to the invention or a recombinant insect virus according to the invention is introduced into insect cells or insect larvae, the insect cells or larvae are cultured under suitable conditions and the expressed protein is isolated. Preferably, insect cells are infected with recombinant insect virus, the infection period preferably being about 40 to about 90, in particular about 70, hours. The production of a recombinant insect virus or the production of a desired protein in insect cells is carried out by processes generally known to the person skilled in the art, such as are described, for example, in EP-B1-0127839 or U.S. Pat. No. 5,004,687. However, commercially obtainable baculovirus expression systems such as, for example, the Baculo Gold(trademark) transfection kit from Pharmingen or the Bac-to-Bac(trademark) baculovirus expression system from Gibco BRL are also suitable.
It is an advantage of the insect cell expression vectors according to the invention and the processes according to the invention that, surprisingly, relatively large amounts, in general about 300-400 mg per 10 cells, of proteins from the DEAD protein superfamily, in particular of proteins having a molecular mass of  greater than about 100 kD and especially proteins having a so-called SR domain, can be produced.
A further subject of the present invention therefore relates to the use from an insect cell vector according to the invention for the production of a protein from the DEAD protein superfamily. The designated proteins are suitable, for example, for the production of appropriate test systems according to EP-A-0778347 or for the treatment of a disorder as described in EP-A-0778347 or in Ellis N. A. (1997), supra.
The following figures and examples are intended to illustrate the invention in greater detail without restricting it thereto.