The present invention primarily relates to a novel hGH-NV cDNA chimera as shown in SEQ ID NO: 3 encoding human growth hormone (hGH) and a process for the preparation of the said novel chimera. Further the invention relates to the use of hGH-NV cDNA chimera to obtain an expressible construct to produce mature human growth hormone.
DNA Isoform and Gene Cluster
The mammalian genomic locus for growth hormone (GH, somatotropin) has expanded very recently in evolution (possibly 10 million years ago) into a cluster of five highly sequence-conserved genes (as in simians and humans). Whereas, in rodents and ungulates, the homologous locus contains solely the gene for the respective somatotropin, in humans, this locus harbors an additional four genes (Fiddes et al, 1979; Seeburg, 1982). The structure of the human growth hormone gene cluster has been determined over 78 kb region of DNA. The entire gene cluster is located on the long arm of human chromosome 17 at bands q22-q24 (George et al., 1981). The DNA sequence of 66,495 bp (contains the sequences of five genes, each consisting of five exons and four introns) reported here represents approximately 0.1% of the entire human chromosome 17. It was isolated on two overlapping recombinant cosmids (Barsh et al., 1983) and has been characterized by restriction analysis as well as with respect to the positions and transcriptional orientations.
There are two growth hormone genes interspersed with three chorionic somatomammotropin genes, all in the same transcriptional orientation. These genes cluster (5′ to 3′: hGH-N, hCS-L, hCS-A, hGH-V, hCS-B) of growth hormone superfamily show the same transcriptional direction and are separated by intergenic regions of 6 to 13 kb in length which contain 48 interspersed middle repetitive sequence elements of the Alu type (Chen et al., 1989, Seeburg, 1982). Among these three of them are truncated depicts the gene cluster map of the Growth hormone superfamily in tandem, which shows genes (5′ to 3′: hGH-N, hCS-L, hCS-A, hGH-V, hCS-B) aligned in the same transcriptional orientation and are separated by intergenic regions of 6 to 13 kb in length, which contain 48 interspersed middle repetitive sequence elements. The genes for hGH and hCS are clustered together at band q22-q24 on chromosome 17, but the prolactin gene is located on chromosome 6. All five genes, including their immediate flanking regions, have been sequenced and are conserved throughout (91-99%).
The human chromosomal growth hormone locus spans approximately 66.5 kb and was sequenced in its entirety to provide a framework for the analysis of its biology and evolution (Chen, et al., 1989; Seeburg, et al., 1982; Lewis et al., 1980; Hirt et al., 1987). hGH-N gene is transcribed exclusively in the pituitary, whereas the other four genes (hCS-L, hCS-A, hGH-V, hCS-B) are expressed only in placental tissue, at levels characteristic for each gene (Hennen, et al., 1985; Macleod, et al., 1991; Frankenne, et al., 1990). The extensive structural information allows a reconstruction of the major steps in the molecular evolution of the hGH locus, from a single ancestral growth hormone-like progenitor gene to the present five-gene arrangement on chromosome 17.
Differential Isoform Expression
Analysis of the sequences of the genes and identification of at least three different classes of duplication units interspersed throughout the five gene cluster suggests that the cluster evolved quite recently and that the mechanism of gene duplication involved homologous but unequal exchange between middle repetitive elements of the Alu family.
These groups of genes are highly homologous throughout their 5′ flanking and coding sequences, but diverge in their 3′ flanking regions which raises the paradox of how genes so similar in structural and flanking sequences can be so differentially regulated. Despite the high sequence identity, these genes are expressed selectively in two different tissues under differential hormonal control (Parks et al., 1989). The hGH-N gene lies in an active chromatin conformation in the pituitary, which is transcribed only in somatotrophic cells of the anterior pituitary, whereas at least one of the chorionic somatomammotropin genes lies in the placental trophoblasts.
The first step in the RNA splicing process is the selection of the 3′ splice site and its polypyrimidine tract, accompanied by the association of factor(s) with an intronic branch point site is mainly determined by its distance to the 3′ splice junction. Thus it seems obvious that the choice of the splice site is a key element both in the general mechanism of splicing and in the special case of alternative processing. This is in agreement with the observation that alternative RNA processing more often involves the use of an alternative acceptor site (as it occurs with hGH mRNA) than of an alternative donor site. The cell type dependent splice variant and it's stability defines the locus of the transcript.
GH/PL gene transcription is controlled in an organ-specific manner during human ontogenesis: GH-V, PL-A, PL-B and PL-L genes are transcriptionally active in the placenta (Seeburg et al, 1982), whereas after birth pituitary-derived hGH-N becomes the predominant endocrine-active GH (Selby et al., 1984). In contrast to hGH-N, placental hGH-V is synthesized during the first months of human life during pregnancy in the syncytiotrophoblast, but its functions for growth, development and metabolism are not clear (Hennen et al., 1985; Cooke et al., 1988). Among the gestational polypeptide hormones, hGH-V and placental lactogen secreted by the placenta, only hGH-V secretion is modulated by glucose, suggesting a metabolic key role for this hormone during pregnancy (Seeburg, 1982; Patel et al., 1995). The majority of total placental GH/PL mRNAs is derived from the PL-A and PL-B genes (95-99%) and only 1-4.2% encode GH-V gene products (Chen et al., 1989; MacLeod et al., 1991; Lytras et al., 1994).
Two alternatively processed mRNAs, omitting or including intron D of the GH-V gene (Untergasser, G. et al., 1998), are expressed by the syncytiotrophoblast resulting in either secreted (22K, 191 aa) or presumably membrane-associated (26K, 230 aa) proteins (Cooke et al., 1988). Secreted hGH-V alternatively termed placental hGH, a somatagen, is detectable in the serum of pregnant woman and becomes the predominant serum hGH by the third trimester of gestation (Hennen et al., 1985; Cooke et al., 1988). This information has helped us to take a novel cDNA walk from one isoform (hGH-V) to the other (hGH-N). cDNA sequence similarity and the restriction enzyme map pattern between the two were taken as the platform for this novel invention.
Both the pituitary- and placental-derived growth hormones are produced in the form of processed mature GH protein with or without any secretion signaling sequence, which is accumulated in the periplasmic or in the soluble fractions of E. coli, respectively (Hsiung et al., 1986; Chang et al., 1987; Martial et al., 1979; Gray et al., 1988 and Joly et al., 1998). hGH-V cDNA encodes a molecular weight of 20,000 Dalton protein (Ucida et al., 1997; Honjo et al., 1996; Igout et al., 1988 and 1993; Frankenne et al., 1990) whereas, the growth hormone is made up of 191 amino acids and has a molecular weight of 21,500 Dalton and is used for treatment of pituitary dwarfism, pediatric chronic renal failure (Mukhija et al., 1995; Grandi, 1991). hGH has recently been found to have remarkable activities such as immune promoting activity or lipolysis stimulating activity, as well as growth promoting activity. Broader applications are greatly expected in the future.
Alternative splicing of the primary hGH-N gene transcription product generates two mRNA species that respectively (DeNoto et al., 1981) encode the hGH (22 ID form) and a somatotropin variant (20 kD form), which has 15 codons deleted from exon III of hGH-N gene transcript (Singh et al., 1974; Lewis et al., 1978). No such alternative splicing is reported for the hCS gene RNAs. Perfect codon colinearity exists for all the mRNAs, with the exception of the hCS-L gene mRNA which carries a different exon III structure.
All five mRNAs specify polypeptides of approximately 200 amino acids in length, of which the N-terminal 26 residues function as signal sequences. The hGH-V gene, which is positioned between the hCS-A and -B genes, encodes a polypeptide differing from GH in 13 positions compared to hGH-N (Seeburg, 1982), while hGH differs from hCS in 28 residues The hCS-L DNA sequence revealed a point mutation in the 5′ consensus splice donor site of its second intron (Hirt et al., 1987), suggesting that splicing might not occur at this position. Thus, the hCS-L gene may not yield a stable RNA or protein product. The hypothetical hCS-L protein is shorter by 18 amino acid residues than the closely related hCS and contains a stretch of 6 residues with no homology to other polypeptides, only 75% sequence identity with hGH. Spatial distribution of transcript of these five genes might be regulated at the level of transcription, or based on both position as reflected in the chromatin structure of the hGH locus, and sequence-specific regulatory elements around the individual genes could be responsible for the observed tissue-specific patterns of gene expression.
Eukaryotic Codon Usage vs. Differentiation
The DNA of genes comprises of several structurally secluded regions i.e., for regulatory as well as the full-length protein encoding open reading frame. The enzyme “RNA polymerases” becomes activated, binds in the control region (called ‘promoter’), slides along the structural gene and transcribes the encoded information with the help of several other functional accessories, into a rough draft of spliced pre-messenger ribonucleic acid (pre-mRNA) and then into its finally processed messenger ribonucleic acid (mRNA). This mRNA message is then translated at the ribosomes in the form of triplet codes for each translatable amino acid, where the translation of a protein begins with the start signal (most commonly ATG) and proceeds until the stop signal (commonly TAA, TAG, TGA).
As mentioned above, the genetic code within DNA is defined by amino acids, which is specified by a triplet, or “codon” (three adjacent nucleotides from A, T, G, C) where the third nucleotide for each specific amino acid in any protein can be varied. In accordance with the genetic code preference of using the third amino acid for a specific amino acid varies among prokaryotes and eukaryotes, which defines the ‘degeneracy’ of codon usage among various organisms (Table 1). Percentage of codon usage and frequency of their occurrence has been clearly shown in Ausubel, F. M. et. al. (1987). Here, we have shown the variability in usage of amino acids and their frequency of occurrence among organisms in table 1 (Ausubel, F. M. et al., 1987). This knowledge tremendously helps in expression of a protein of interest in a heterologous system in recombinant technology, where nucleotides can be changed within the triplets at the DNA level to improve the yield of any foreign protein expression. Our analysis of chimeric hGH identification of those regions of the sequence that could possibly result in poor heterologous expression. We take advantage of differences in hGH-N sequence and its alleles codon usage to come up with a expression clone which can be of great help in expressing human growth hormone in various organisms like E. coli, yeast, insect, CHO, etc. by modifying with appropriate codon usage.
The complete sequence of the human growth hormone (hGH) gene and the position of the mature 5′ end of the hGH mRNA within the sequence has been determined by DeNoto F. M. et al., 1981. Comparison of this sequence with that of a cloned hGH cDNA shows that the gene is interrupted by four intervening sequences. S1 mapping shows that one of these intervening sequences has two different 3′ splice sites. These alternate splicing pathways generate hGH peptides of different sizes, which are found in normal pituitaries. Comparison of sequences near the 5′ end of the hGH mRNA with a similar region of the alpha subunit of the human glycoprotein hormones reveals an unexpected region of homology between these otherwise unrelated peptide hormones.
Mechanisms of alternative RNA splicing are commonly implicated in the generation of protein diversity. Splicing at one alternative 3′ site located 45 bp downstream from the normal 3′ site of intervening sequence B is likely to generate the mRNA for one smaller hGH peptide found at low levels in normal pituitaries. The position of the 5′ end of the mature mRNA was located using S1 mapping. S1 nuclease analysis provides evidence for the existence of multiple RNAs. hGH gene may be a good system to approach some unsolved questions about the accuracy of RNA splicing and the regulation of splice site selection in alternative pathways. The signal sequence TATAAA, thought to be involved in initiation of transcription, is found approximately 25 bp upstream from the 5′ end of the mature mRNA. Conserved sequences are also found in the gene near the sequence where poly A is added to the mRNA. Surprisingly, there is a region of homology in the 5′ untranslated regions of the mRNA's of hGH and the otherwise unrelated alpha-subunit of the human glycoprotein hormone. Our work; further aims in bringing the suttle differences in mRNA secondary structure of chimeric hGH for better expression. Sononick (1985) has proposed that influence of the RNA secondary structure exists on the splice sites selection. He has shown that frequently used splice sites become optional when sequestered in a hairpin loop. According to their observation there is a local secondary structure in the regions of the alternative splicing site of the hGH precursor mRNA, a large hairpin structure trapping both the 22K and the 20K acceptor sites, and also the branch point sequence of the splice site. The stable structure, by looping out the 22K and 20K splice sites and by interfering with the lariat formation leading to the specific splice site, would favour the jump splice at the alternative acceptor site.
In many introns, the computer searches have shown a well-conserved region between coordinates −60 and −21 from the 3′ splicing cleavage site. This box has an additional element affecting acceptor splice site recognition and lariat formation (Leocomte et al., 1987). S1 mapping analysis of human pituitary RNA confirms the existence of at least four distinct hGH mRAs originating from alternative acceptor sites at the second intron of the primary transcript. Analysis was done on the hGH gene sequence to explain the high frequency of alternative splicing which occur only at this location. Out of the four introns of the hGH gene, three contain a “CTTG” box in the region upstream from the acceptor splice site (intron A, B, and D). However, the “CTTG” box was not found in intron C and upstream from the 20K alternative splice sites. For many introns, it has been reported that a good complementarities between the region containing the “CTTG” box and the 5′-end of the first loop of U2 snRNA might play a role in branch point selection of specifying splice sites by RNA-RNA base pairing. Generation of protein diversity in hGH possibly implicates that it is due to their differences in RNA secondary structure as well as codon usage have different levels of expression in a preferred host.
Construction of Chimera and Industrial Rationale
The construction of recombinant vector to be used in cloning comprises the construction of hybrid plasmids containing the nucleotide sequence which codes for mature protein of interest e.g., mature hGH, fused at its 5′ terminal to a sequence which codes for a fusion tag peptide sequence and a protease cleavage site. The hybrid plasmid thus are made heterologous by making chimeric DNA construct which has made its potential application in lab scale as well as in large scale in recent years. Concept of construction of chimera with the gene of interest in recombinant technology facilitates yield of expression as well as the purification steps. Success rates in both aspects depend on the choice of the fusion peptide selected.
Immobilized metal ion affinity chromatography (IMAC) is most widely used technique for purification of recombinant proteins. Engineering of His-tag on the amino or carboxyl terminal of protein allows selective absorption of the recombinant protein on immobilized metal ions, such as Ni2+, Co2+, Zn2+, Cu2+, and Fe2+. These techniques provide selective purification of the target protein (Mukhija et al 1995; Shin et al., 1998; WO 9115589). Use of basic amino residues comprising of Histidine or Lysine or Arginine allows the invention to be worked effectively.
Overall the selection and application in making the chimeric construct aids the percentage of product yield in pilot scale as well as in large scale in an industrial background. Keeping a protease cleavage site (e.g., thrombin, factor V, factor Xa, serine proteases site like enterokinase) right after the affinity tag (either His-tag or GST-tag) is commonly used in the chimera construction where expressed heterologous protein is easily cleaved off from the tag either on the column or outside the column and the protein of interest having the sequence of naturally occurring hGH, is isolated. Choice of the affinity chromatographic technique helps in handling any large batch purification for industrial purpose with fewer steps in purification and reduces the loss of the final product of interest, thus making the whole process cost effective.
History of Cloning of hGH
Human growth hormone (hGH) is secreted in the human pituitary. In its mature form it consists of 191 amino acids, has a molecular weight of about 21,500 and thus is more than three times as large as insulin. Until the advent of recombinant DNA technology, hGH could be obtained only be laborious extraction from a limited source i.e., the pituitary, glands of human cadavers. The consequent scarcity of the substance has limited its application to treatment of hypopituitary dwarfism even though it has been proposed to be effective in the treatment of burns, wound healing, dystrophy, bone knitting, diffuse gastric bleeding and pseudarthrosis. In fact, available estimates are that the amount of hGH available from tissue is adequate only to serve about 50% of the victims of hypopituitary dwarfism. Thus, no hGF is available for other applications. Hence, it is required to take the advantage of recombinant DNA technology for cloning of hGH and production specifically in E. coli. The use of E. coli as a microbial host for complex heterologous polypeptides is now well-established in micorbial host. First it has been shown that hGH can be produced in a recombinant host cell, specifically E. coli in good quantities which would be adequate to treat hypopituitary dwarfism and the other conditions for which it is effective (for example, U.S. Pat. No. 4,342,832). hGH expressed by the process of U.S. Pat. No. 4,342,832 leads to a product, which can be used for therapeutic applications. Later, various modifications were done generating vector constructs for hGH expression in E. coli (also in Bacillus and Pseudomonas). So far, E. coli generated recombinant hGH has been well accepted by the patient as it does not give any immunogenic reaction. Since this is not a glycoprotein, E. coli is the most suitable host for expressing hGH.
Attempts have been made in making vector constructs with specific regulatory elements to control the site of expression & yield of expression in E. coli host system as well as at the level of expressed proteins secondary structure formation during cloning steps (Tokunaga et al., 1985). There are three ways in which hGH has been expressed for example, within the cell as inclusion bodies, in the periplasmic space or as secretory protein which is out of the cell (U.S. Pat. No. 4,755,465).
The secretion of 22 kD hGH into the periplasm of E. coli has been reported the secretion of 22 kD hGH (Gray et al. 1985, 247-254; U.S. Pat. No. 5,279,947). The periplasm refers to a space between the inner membrane and the outer membrane of gram-negative bacteria including E. coli. In gram-negative bacteria, a precursor protein containing a signal sequence is necessary for membrane penetration and upon passing the inner membrane the secretory signal is cleaved off. The protein is then processed into the matured protein by losing its secretory signal. There are several cloning patents reported so far (list has given below) which have helped to plan our strategy of hGH cloning unique to our laboratory.
Recently, it has become possible to produce the human growth hormone intracellular, extracellularly or in the periplasm by means of recombinant DNA technology in which a gene of hGH is expressed in a microorganism as a host (U.S. Pat. Nos. 5,047,333, 4,755,465, 4,604,359 and 4,601,980).
In situ post-translational modification with various enzymes for example Factor Xa, Enterokinase, Renin, or specific chemicals like CNBr, was also another area which has been looked upon where the mature hGH is isolated from either its chimeric protein part or from its extended N-terminal part. This enables its easy separation and purification of the total protein by means of appropriate chromatographic techniques from rest of the host protein. The impurity profile of purified hGH may directly depend on the choice of in situ post-translational modification.
There isn't a prior art so close to the invention disclosed herein. The teachings of the patent prior art, which are somewhat relevant, have been summarized below.
Here, in U.S. Pat. No. 3,853,832, the patent claims a method of producing a synthetic human growth promoting substances and is not related to inventive aspect of the present invention. It deals with polypeptide of 188 amino acid whereas human hGH has 191 amino acids. It teaches the introduction of sulphhydryl groups, oxidation and talks about their position. Present invention is related to novel approach of producing hGH by cloning of cDNA. U.S. Pat. No. 5,955,346 deals with nucleotide sequence encoding variant of human prolactin. The only similarity is it deals with protein.
U.S. Pat. No. 4,446,235 teaches a method of obtaining cDNA encoding a desired polypeptide and involves probing cloned genomic DNA to obtain the genomic sections and uses COS cell vectors. Therefore it doesn't teach what is taught by the present invention.
U.S. Pat. No. 4,665,160 claims the sequence of amino acids as are present on hGH-V and doesn't teach about cloning of cDNA or use of isoform-derived chimera or novel walk which is the novelty of the present invention. Present invention is related to production of hGH from hGH-NV.
U.S. Pat. No. 4,670,393 claims the replicable clone vehicle comprising arm inserted DNA sequence and enlists the sequence of amino acids whereas the present invention is related to production of hGH by a novel technique wherein a concept of chimera is exploited for the first time. U.S. Pat. No. 5,849,535 claims a variant of hGH and therefore it doesn't teach the present invention. U.S. Pat. No. 5,597,709 also claims a method for producing the human growth hormone variants by recombinant DNA technique from wild-type hGH which mainly focuses on the hGH-V but not on the pituitary derived hGH-N unlike the present invention. GB2055382A teaches use of pituitary hormonal fragments in synthesis of hGH. It teaches to combine naturally occurring and synthetic portions to get hGH using bacterial expression system. It is classically different from cloning of chimera of two isomers, which is the present invention. It talks about cloning vehicle and cloning vectors which are different.
U.S. Pat. No. 5,496,713 teaches the method to produce hGH in periplasmic space, which amounts to secretory production method. The present invention deals with a method of production, which is within the cell, well inside the wall and therefore is fundamentally different. U.S. Pat. No. 5,789,199 also teaches a technique to produce the heterologous polypeptides in bacterial periplasm. U.S. Pat. No. 5,047,333 teaches a method to produce naturally occurring hGH in Bacillus cell therefore it is different as in the present invention. The technique used in the present invention is novel and the host cell used is not Bacillus and the inventive concept is different than U.S. Pat. No. 5,047,333.
U.S. Pat. No. 4,859,600 claims of making a recombinant prokaryotic host cell containing hGH, the amino acid sequence of which consists of the amino acid sequence of naturally occurring hGH, which is free of other proteins associated with its native environment and is free of mature hGH having an extraneous N-terminal methionine, and which was produced by said recombinant prokaryotic host. U.S. Pat. No. 5,618,697 teaches a process of producing hGH where the construct contains a amino-terminal extension with a negatively charged amino acid sequence for easier downstream process.
U.S. Pat. No. 4,601,980 mainly talks about the downstream process to obtain the polypeptide produced and therefore deals with the aspects of fermentation and purification which is not the subject of the present invention.
U.S. Pat. No. 4,755,465 and several other patents teach, reveal and describe aspects of presence of methionine at N terminus and the advantages thereof and don't teach the present invention. U.S. Pat. No. 5,633,352 shows steps involves in the biosynthesis of hGH from pituitary derived hGH. A variant of a native human growth hormone with altered binding properties was shown in U.S. Pat. No. 5,688,666 by substituting a set of amino acids.
U.S. Pat. No. 5,635,604 shows a substantially pure amino-terminal extended hGH wherein X is a charged amino acid sequence having at least two amino acids and wherein the N-terminal amino acid of X is a negatively charged amino acid, other than Lys and Arg.
U.S. Pat. No. 6,436,674/EP0974654 teaches us about an efficient cloning method in E. coli and Salmonella where human growth hormone is targeted to the periplasmic space using a secretory signal peptide. Preferred method is secretory as per this invention because isolation and purification is rendered easy due to low level of impure proteins in periplasm as stated by the inventor.
In EP0587427 a process to produce 20K hGH is described using a novel neutral protease from Bacillus amyloliquefaciens. The process is secretory where this hGH is secreted into the bacterial periplasm. Another patent JP61224988 talks about a recombinant plasmid of E. coli for amplification of 20K hGH cDNA. In contrast JP61202689 describes an extraction process of growth hormone from human pituitary tissue and its subsequent processes.
EP0489711 describes the process to produce hGH having amino acid sequences of naturally occurring hGH. The process is characterized by the presence of Met at the N-terminus of the 1st polypeptide produced.
Above referred citations either deal with secretory processes or with methionine characterized larger polypeptide or use of human pituitary and human placenta cDNA isoforms for producing hGH. However none of these documents describe a technique to use human placenta and pituitary hGH cDNA isoforms to derive hGH-NV cDNA, chimera, further to obtain an expressible construct of the chimera derived SEQ ID NO:5 which is linked with a strech of non-GC rich sequences in duplicate to subsequently produce 22K hGH which is identical in amino acid sequence to pituitary hGH, with better transcript secondary stretch than hGH-N (wild type). The unique construct and the better expression of the matured hGH is an inventive step of the present invention.