CD4, a normal membrane component of the T4 lymphocyte, binds gp120, an envelope glycoprotein of the human immunodeficiency virus (HIV). This RNA virus, which is responsible for acquired immune deficiency syndrome (AIDS) in humans, uses CD4 as its receptor for infection (Klatzmann, D., et al. (1984) "Selective tropism of lymphadenopathy associated virus (LAV) for helper-inducer T lymphocytes." Science 225:59-63). CD4 has 4 extracellular domains (Maddon, P., et at. (1985) "The isolation and nucleotide sequence of a eDNA encoding the T cell surface protein T4: A new member of the immunoglobulin gene family." Cell 42:93-104). A soluble molecule including some or all of these domains is referred to as sCD4. The two N-terminal domains of CD4 appear to be the most important for gp120 binding and proteins which incorporate this gp120 binding capability have been proposed as potential therapeutics for AIDS because they may target the protein to the virus, to HIV-infected cells, or to other species that might have exposed gp120 (Hussey, R., et al. (1988) "A soluble CD4 protein selectively inhibits HIV replication and syncitium formation." Nature 331:768-81; Deen, K., et al. (1988) "A soluble form of CD4 (T4) protein inhibits AIDS virus infection." Nature 33 1:82-84; Traunecker, A., et at. (1988) "Soluble CD4 molecules neutralize human immunodeficiency virus type 1." Nature 331:84-86; Berger, E., et al. (1988) "A soluble recombinant polypeptide comprising the amino-terminal half of the extracellular region of the CD4 molecule contains an active binding site for human immunodeficiency virus." Proc. Natl. Acad. Sci. U.S.A. 85:2357-2361). The determinants for high affinity binding of gp120 are in domain 1, residues 1-109 of CD4 (Arthos, J., et al. (1989) "Identification of the residues in human CD4 critical for the binding of HIV." Cell 57:469-481).
sCD4-PE40 is such a potential therapeutic agent for the treatment of AIDS. (Chaudhary, V., et al. (1988) "Selective Killing of HIV-Infected Cells by Recombinant Human CD4-Pseudomonas Exotoxin Hybrid Protein." Nature 335:369-372). This hybrid protein consists of an N-terminal methionine (amino acid 1) followed by the first two domains of CD4 (178 amino acids), several linker amino acids, and the last two domains of Pseudomonas exotoxin A (amino acids 253-613 of the toxin). The resulting protein contains 545 amino acids and has a calculated molecular weight of approximately 59,200 daltons. Amino acids 2-110 in sCD4-PE40 (Chaudhary, supra) correspond to residues 3-111 in the cDNA sequence of Maddon, supra, except that residue 3 of the Maddon sequence should be lysine, and residues 1-109 (domain 1) of Arthos, supra. The gene for sCD4-PE40 has the sequence reported by Chaudhary, supra except that the codons that correspond to the N-terminal portion of the protein have been modified as described for sCD4-183 in PCT Application No. PCT/US90/01367, and codon 179, corresponding to Ala, is GCT rather than GCG.
Upon expression of sCD4-PE40 in E. coli, we have found a major contaminant which is immunologically-related to sCD4-PE40 and has a molecular mass of approximately 50,000 daltons. This protein has the N-terminal sequence Met-Leu-Val-Phe-Gly-Thr-Ala- which corresponds to the C-terminal 449 residues of sCD4-PE40, i.e., beginning with Leu.sup.97 (preceded by a methionine). The 50,000 dalton protein results from internal initiation within domain 1 of sCD4; a UUG codon down-stream of potential Shine-Dalgarno sequences is read as an initiation codon by f-Met-tRNA. Since the contaminant is closely related to the full length sCD4-PE40 product, it has similar biochemical properties. Accordingly, it co-purifies with the desired product and may interfere with the oxidation and folding of sCD4-PE40 to its biologically active conformation.
Among the potential causes investigated for the impurity is internal initiation. A gene including the above-described region of domain I with the potential for internal initiation may generate an impurity with an N-terminal Met-Leu-Val-Phe-Gly-Thr-Ala- sequence. Such internal initiation could result from translating a sCD4-containing gene in many prokaraytic organisms, including but not limited to E. coli. Since proteins including sCD4 components are potential human drugs, it is desirable to eliminate the cause of the contaminating protein.
Four sequence-related features appear to positively favor translation initiation in prokaryotes. First, the preferred initiation codon is AUG. GUG and UUG can function as initiation codons although at only about 10 and 1 percent of the frequency of AUG, respectively. (Hershey, J. (1987) Protein Synthesis. In "Escherichia coli and Salmonella typhimurium: Cellular and Molecular Biology". F. C. Neidhardt, et al., eds. (American Society for Microbiology: Washington, DC) p.613-641.) These codons are recognized by f-Met-tRNA as the site where amino acid polymerization is to begin. (Gold, L. (1988) "Posttranscriptional Regulatory Mechanisms in E. coli." Ann. Rev. Biochem. 57: 199-233.)
The second feature that favors prokaryotic initiation is the Shine-Dalgarno sequence, e.g. 5'-UAAGGAGGUGA-3', a sequence in the mRNA which is complementary to the 3' terminal sequence of 16s rRNA, such that base pairs can be formed to stabilize the initiation complex. (Shine, J., and Dalgarno, L. (1974) "The 3' terminal sequence of E. coli 16s ribosomal RNA: Complementarity to nonsense triplets and ribosome binding sites." Proc. Natl. Acad. Sci USA 71:1342-1346; Steitz, J., and Jakes, K. (1975) "How ribosomes select initiator regions in mRNA: Base pair formation between the 3' terminus of 16s rRNA and the mRNA during initiation of protein synthesis in E. coli." Proc. Nail. Acad. Sci. USA 72:4734-4738.) A variety of sequences which retain complementarity to the 16s RNA can function in this role. Shine-Dalgarno-like sequences, usually include GGAG or GAGG, and typically are located about 5-13 bases upstream of the initiation codon for most effective initiation. (Gold, L., supra.)
The third feature is a region which facilitates ribosome binding and initiation. A preferred pattern of nucleotides spanning at least -20 to +13 bases about the initiation codon of many E. coli genes has been detected by in vitro analysis of ribosome protected sequences (Steitz, J., supra) and by statistical analysis (Stormo, G., et al. (1982) "Characterization of translational initiation sites in E. coli." Nucleic Acids Res. 10:2971-2996; Schneider, T., et al. (1986) "Information Content of Binding Sites on Nucleotide Sequences." J. Mol. Biol. 188:415-431). Additionally, translation reinitiation can occur if a translational start signal overlaps (Oppenheim, D., and Yanofsky, C. (1980) "Translational Coupling During the Expression of the Tryptophan Operon of Escherichia Coli." Genetics 95:785-795) or follows one of the translational stop signals. (Steitz, J. (1979) "Genetic signals and nucleotide sequences in messenger RNA." In "Biological Regulation and Development. 1. Gene Expression.") Such reinitiation does not require a Shine-Dalgarno sequence and differs from the intragenic initiation discussed herein.
The fourth feature is the absence of significant mRNA secondary structure in the initiation codon region that might block the necessary annealing events with the 16s RNA or the initiator tRNA (Gold, L. (1988), supra).
The presence of potential translation initiation points can be identified in several ways. First, the sequencing of the N-terminus of immunoreactive peptides should yield methionine for peptides resulting from initiation although in some cases, methionine aminopeptidase can remove methionine leaving the adjacent residue in the sequence at the N-terminus (Waller, J. (1963) "The NH.sub.2 -terminal residue of the proteins from cell-free extracts of E. coli." J. Mol. Biol. 7:483-496; Ben-Bassat, A., et al. (1987) "Processing of the initiation methionine from proteins: Properties of the E. coli methionine aminopeptidase and its gene structure." J. Bacteriol. 169:751-757). In that case, one must rely on the gene sequence to determine if the terminal amino acid was encoded with an adjacent codon capable of initiating translation. Codons which direct the insertion of the N-terminal Met can be AUG, GUG or UUG (Gold, L., supra). Secondly, one can analyze the gene for sequences approximating a good initiation region. Many of these sequences are not functional. (Stormo, G., and Schneider, T., supra). Translation initiation points can be found through "footprinting" or "toeprinting" experiments in which regions of the mRNA to which ribosomes bind either are protected from nuclease digestion or block the elongation of a primed, reverse-transcribed DNA copy. (Gold, L., supra.)
Intragenic ribosome initiation sites have been identified in a number of genes. Following expression in E. coli of poliovirus 3C protease, initiation at the AUG of codon 27 gave rise to significant levels of an unstable internal initiation product (Hanecak, R., et al. (1984) "Expression of a cloned gene segment of poliovirus in E. coli: Evidence for autocatalytic production of the viral proteinase." Cell 37:1063-1073; Ivanoff, L., et al. (1986) "Expression and site-specific mutagenesis of the poliovirus 3C protease in E. coli." Proc. Natl. Acad. Sci. USA 83:5392-5396). Furthermore, expression of xylanase in E. coli was accompanied by the production of a species apparently initiating at GUG, codon 47 1. (Grepinet, O., et al. (1988) "Nucleotide sequence and deletion analysis of the xylanase gene (xynZ) of Clostridium thermocellum." J. Bacteriol. 170:4582-4588.) Translation initiation within the porcine parvovirus structural protein B occurs at internal initiation sites, with at least two of these internal initiation peptides produced at higher levels than the full length recombinant protein. (Hailing, S., and Smith, S. (1985) "Expression in E. coli of multiple products from a chimaeric gene fusion: Evidence for the presence of procaryotic translational control regions within eucaryotic genes." Bio/Technology 3:715-720.) Finally, expression of a simian rotavirus glycoprotein in E. coli generated an apparent product of internal initiation at a level similar to that of the full length molecule. (Arias, C., et al. (1986) "Synthesis of the outer-capsid glycoprotein of the simian rotavirus SA11 in E. coli." Gene 47:211-219.) It has been proposed that commercial production can be facilitated by removing internal initiation sites through mutagenesis (Halling, S., supra).
Once the cause of the impurity has been determined to be internal initiation, a method of eliminating the internal initiation needs to be developed.