Efficient and high-level recombinant production of heterologous proteins is an important alternative to chemical synthesis and the isolation of proteins from native sources. Recombinant protein production is especially useful when the native protein is normally produced in limited amounts or by sources which are impossible, expensive and/or dangerous to obtain or propagate. Although a number of recombinant expression systems have proven useful for production of various heterologous proteins, none of these systems is universally applicable for the production of all proteins. For instance, E. coli appears to lack the ability to provide many post-translational modifications to heterologous proteins. Yeast can provide only some post-translational modifications (e.g., glycosylation patterns), and rapid degradation of heterologous proteins in yeast is common. Additionally, heterologous proteins secreted by yeast may contain long, untrimmed oligosaccharide chains, which sometimes results in biologically inactive or antigenically altered proteins. Moreover, a replacement of the natural mammalian signal peptide with a yeast signal peptide is almost always required for efficient secretion of mammalian proteins by yeast. Expression of heterologous eukaryotic proteins in insect or mammalian cells can be more reliable but both require expensive media for cell propagation. Moreover, cultured insect cells and mammalian cells have a relatively long doubling time compared to conventional bacterial systems such as E. coli and certain protozoa such as Tetrahymena. 
Protozoa represent an alternative for the recombinant production of heterologous proteins, however few protozoa have been characterized to the extent necessary for routine heterologous protein expression. Well-characterized pathogenic protozoa that have been genetically engineered to express heterologous proteins include Trypanosoma cruzi, Trypanosoma brucei, and Leishmania spp. A number of shuttle vectors designed for episomal replication and coding region expression in pathogenic protozoa have been developed. An inducible coding region expression system has been established for pathogenic T. brucei (Wirtz, E., et al., Science, 268, 1179-1183 (1995)). Vectors that allow efficient coding region expression in different hosts like E. coli and mammalian cells have also been developed (Al-Qahtani, A., et al., Nucleic Acids Res., 24, 1173-1174 (1996)).
Protozoa are characterized by a glycosylphosphatidylinositol (GPI) anchoring system that allows targeted surface expression, or “display,” of various endogenous proteins. Recent experiments in the kinetoplastid Trypanosoma cruzi demonstrated that mammalian and protozoan signal peptides function in T. cruzi to target a heterologous protein to different cellular compartments, and further showed both secretion and GPI-anchored surface expression in T. cruzi of a heterologous protein (Garg et al., J. Immunol., 158: 3293-3302 (1997)). Surface display in T. cruzi of chicken ovalbumin (OVA) was achieved using a construct comprising the signal sequence of T. cruzi glycoprotein, gp-72, that targets the protein to the endoplasmic reticulum, followed by a coding region for OVA, followed by 45 amino acids of amastigote surface protein I of T. cruzi which provided a C-terminal hydrophobic tail containing GPI anchor cleavage/attachment site. The protein thus anchored to the surface of the protozoan via a GPI structure was found to be readily presented in association with class I MHC by parasite-infected host cells.
Heterologous proteins have also been expressed in the slime mold Dictyostelium discoideum. A number of proteins have been expressed in this system including surface expression of the malaria circumsporozoite antigen (CSP) (Reymond et al., J. Biol. Chem. 1995, 270: 12941-12947); see Williams et al., Current Opin. Biotechnol., 1995, 6:538-542, for a review).
Bioactive cytokines (IL-2 and IFN-γ) have also been produced in both T. cruzi and Leishmania (La Flamme et al., Mol. Biochem. Parasitol., 75:25-31 (1995), and Tobin et al., J. Immunol., 150:5059-5069 (1993)) in experiments that suggest that mammalian signal peptides are recognized and processed by these protozoa. However, pathogenic protozoa have not been exploited as a general purpose protein expression system, presumably because they are difficult or expensive to grow in large numbers and/or are infectious to human beings.
The nonpathogenic ciliate protozoan Tetrahymena has also been explored as a vehicle for expression of heterologous genes, but with limited success to date. T. thermophila has been successfully transformed using self-replicating palindromic ribosomal DNA (rDNA) purified from macronuclei (Tondravi et al., Proc. Natl. Acad. Sci. USA 83:4369-4373 (1986)). Selection of transformants relied on a dominant paromomycin-resistance mutation in the 17S rRNA. rDNA-based shuttle vectors capable of autonomously replicating in Tetrahymena as well as in E. coli have also been developed; these plasmids contained a replication origin (ori) from the T. thermophila rDNA minichromosome (Yu et al., Proc. Natl. Acad. Sci. USA 86:8487-8491 (1989)).
rDNA vectors are usually circular vectors containing both regulatory regions and “coding” regions for Tetrahymena rRNA. A typical somatic rDNA vector contains a 5′ nontranscribed sequence (5′-NTS), followed by a “coding” region for rRNA, followed by a 3′ nontranscribed sequence (3′-NTS). A transgene is inserted into the 3′ NTS. Somatic rDNA vectors contain the macronuclear version of rDNA and transform either by replacement of the macronuclear rDNA gene via homologous recombination or by autonomous replication as an extrachromosomal element. Processing rDNA vectors, on the other hand, contain additional processing signals upstream and downstream from the 5′-NTS and the 3′-NTS, respectively, obtained from the micronuclear version of rDNA. Processing rDNA vectors mimic what happens to the micronucleus rDNA in the newly developing macronucleus. After introduction of the vector into the developing new macronucleus during the sexual process of ciliates known as conjugation, the vector-borne micronuclear rDNA undergoes excision and is maintained as an rDNA minichromosome (Yao et al., Mol. Cell. Biol. 9:1092-1099 (1989)).
Both somatic and processing rDNA vectors have been used to insert a heterologous nucleic acid into a 3′ nontranscribed spacer region of rDNA. For example, M.-C. Yao et al. (Proc. Nat'l. Acad. Sci. USA 88:9493-9497 (1991)) expressed cycloheximide resistance in Tetrahymena using an rDNA vector having the rp129 cycloheximide resistant gene from T. thermophila inserted into the 3′ nontranscribed spacer region (NTS) of the rDNA sequence. Similarly, P. Blomberg et al. expressed neomycin resistance in T. thermophila using an rDNA vector having the neo gene inserted into the 3′ NTS, under control of rp129 flanking sequences (Mol. Cell. Biol., 17:7237-7247 (1997)).
Gaertig et al. described an rDNA-based shuttle vector, E. coli vector pH4T2, that contains two replication origin (ori) fragments, followed by a 300 base pair 5′ untranslated region obtained from the HHF1 gene of Tetrahymena, followed by the prokaryotic gene for neomycin resistance, neo, followed by a 3′ untranslated region from BTU2 from Tetrahymena (J. Gaertig et al., Nucleic Acids Res. 22:5391-5398 (1994)). Haddad et al. reported a small circular rDNA-based vector containing a repeat of the replication origin of rDNA (i.e., a 5′ NTS), a neo2 gene cassette (consisting of the neo gene under the control of histone HHF1 promoter and the BTU2 transcription terminator) as a selectable marker, and a green fluorescent protein (GFP) cassette (also under control of HHF1 promoter and BTU2 terminator) (A. Haddad et al., Proc. Nat'l. Acad. Sci. USA 94:10675-10680 (1997)). Rusconi et al. reported a circular vector containing the rDNA replication origin, neo2 cassette, and a tRNA gene (Genes Dev. 10:2870-2880, 1996)).
A typical rDNA-based vector is a circular bacterial vector that contains a 5′NTS comprising two or more of ori sequences from Tetrahymena rDNA, followed by a selectable cassette marker such as the neo 2 cassette (Gaertig et al., Nucleic. Acids Res. 22:5391-5398 (1994). A nucleic acid fragment containing a heterologous coding region such as a transgene, flanked by a 5′ untranslated region of a Tetrahymena gene (most often the ˜30 bp 5′ untranslated region of the HHF1 gene of Tetrahymena) and a 3′ untranslated region of a Tetrahymena gene (most often ˜300 bp of the 3′ untranslated region of the Tetrahymena gene BTU2), is typically inserted downstream of the selectable marker.
An rDNA construct that contains relatively short 5′ and 3′ untranslated sequences from two different protein coding genes of Tetrahymena, such as HHF1 and BTU2, is unlikely to integrate into the Tetrahymena genome via homologous recombination at the corresponding protein-coding loci. It is more likely to insert into Tetrahymena rDNA as a result of a single crossover event which involves the replication origin fragment. In addition, an rDNA-based vector can be maintained as an extrachromosomal element; the ori from Tetrahymena rDNA is known to support extrachromosomal replication. The marker gene (e.g., neo), and the transgene, if present, are therefore most likely expressed from the transforming rDNA-based plasmid and/or as a result of insertion into genomic rDNA, and not by recombination with endogenous genes other than rDNA.
Due to frequent and unpredictable integration of sequences from rDNA vector and rDNA-based vectors into the native rDNA, however, levels of expression of recombinant gene products are presumed to be highly variable. See J. Gaertig et al., Nucleic Acids Res. 22:5391-5398 (1994); R. W. Kahn et al., Proc. Natl. Acad. Sci. USA 90:9295-9299 (1993); W. J. Pan et al., Nucleic Acids Res. 23:1561-1569 (1995); and W. J. Pan et al., Mol. Cell Biol., 15:3372-3381 (1995). When relying on rDNA vectors for transformation, there is no way to control the level of integration into the host chromosome, hence no way to control copy number and, as a result, the expression level of a heterologous protein. Tetrahymena contain about 45 copies of each protein coding gene in the macronucleus, and each copy contains about 10,000 pallindromic copies per macronucleus. Thus, using either of these types of vectors, it is possible for a transgene to integrate at a similar copy number (10,000+). Overexpression of a transgene can be toxic to the protozoan host cell. Moreover, the loss of transgenes using these vectors cannot be prevented since this recombinant method generally lacks a reliable and sustainable means for selection. For example, a vector can contain both a transgene and a selectable marker, and both may initially integrate into the protozoan host genome. However, subsequent cross-over events can eliminate the transgene while leaving the marker gene in the host genome, resulting in selection of cells that do not necessarily contain the transgene.
A protein expression system that provides for the efficient expression and isolation of both prokaryotic and eukaryotic heterologous proteins in a nonpathogenic protozoan host is needed. In particular, a protein expression system that could provide surface expression of a heterologous prokaryotic or eukaryotic protein would constitute a much desired advance in the art.