The numerous polypeptides which make up living organisms and their biochemical constituents are the expression products of the information contained in deoxyribonucleic acid (DNA). This information is coded by the order of the nucleic acid basis on the linear DNA sequence. The four bases, adenine (A), thymine (T), cytosine (C) and guanine (G), are arranged in a linear sequence as a single chain of DNA. Each triplet of bases, called a codon, encodes for a single amino acid.
With the advent of recombinant DNA technology, exemplified by the seminal work of Cohen and Boyer (U.S. Pat. No. 4,237,224), it became possible to introduce foreign genes into microorganisms and regulate the level of their expression. The cutting and splicing of DNA to prepare hybrid DNA sequences has been termed recombinant DNA (rDNA). This work relies on the discovery that restriction endonucleases (REN) recognize particular sites on a DNA sequence and cleave the DNA within these sites to produce predictable breaks in a sequence of DNA. REN sites have now been used in a variety of procedures to obtain expression of structural genes in foreign organisms.
To prepare recombinant DNA containing the appropriate elements to express foreign genes in a host cell, one normally purifies mRNA from tissues which express the desired polypeptide. The structural gene DNA may be reconstructed from the mRNA sequence by the enzyme reverse transcriptase, which has been isolated from an avian retrovirus. This complementary DNA (cDNA) can be digested with the REN, which cleave the cDNA at precisely defined sequences. This cDNA fragment is then typically cloned into an extra-chromosomal DNA sequence which replicates autonomously, called the plasmid. The known techniques which have been employed generally involve naturally occurring restriction sites to construct the recombinant plasmids, or the introduction of short single-stranded oligonucleotide sequences followed by the completion of the double-stranded DNA using, e.g. DNA polymerase.
Many of the techniques using synthetic DNA oligonucleotides and restriction sites have been reviewed in R. Wetzel and D. V. Goeddel, "Synthesis of Polypeptides By Recombinant DNA Methods", from The Peptides, Academic Press, Inc., 5:1 (1983). Gene editing techniques have been used to alter the aminoterminal portion of interferon genes. The new genes have translational start codons immediately before the codon for the first amino acid of the mature protein, rather than at the beginning of the signal paptede coding region as occurs in the native gene (Goeddel, D. V., et. al., Nature 287:411 (1980); Goeddel, D. V., et. al., Nucl. Acid Res., 8:4057 (1980)). A similar "semi-synthesis" approach has been used to construct a gene coding for human growth hormone. A synthetic DNA fragment containing an ATG codon and the sequence for the first 23 amino acids of hGH was ligated to the remainder of the gene, which had been produced by the cDNA method (Goeddel, D. V., et. al., Nature 281:544 (1979)). This resulted in a gene that would direct the expression of mature hGH, instead of the pre-hormone.
Hybrid genes of interferon have also been constructed using REN sites common to two homologous genes (Weck, et. al., Nucl. Acids Res., 9:6153 (1981)).
In addition to these synthetic and semi-synthetic procedures for changing DNA sequences, internal mutations have been achieved randomly by chemical agents or ultraviolet light, or in specific locations using single-stranded oligonucleotides (Wallace, R. B. et. al., Nucl. Acids Res. 9:3647 (1981); Dalbadie-McFarland, G. et. al.,Proc. Natl. Acad. Sci. U.S.A. 79:6409 (1982)).
In addition to the above methods of creating synthetic or altered genes, altered proteins (polypeptides), termed analogs, have been created for numerous applications by chemical synthesis of the entire amino acid sequence of the analog. As an example of a series of protein analogs, there are numerous opiod analgesics based on the Leu-Enkephalin Pentapeptide, related to B-endorphin. These peptides termed dynorphins range from tridecapeptides to heptadecapeptides, as exemplified in U.S. Pat. No. 4,396,606.
Human pancreatic growth hormone-releasing factor (hpGRF) was first isolated, purified and sequenced as a 44 amino acid polypeptide which stimulated the secretion of immunoreactive growth hormone (Guillemin, R, et al., Science, 218:585-587 (1982)) Subsequently, a varient was isolated and purified from a pancreatic islet tumor. This hpGRF (hpGRF(1-40)-OH) was found to terminate at amino acid residue 40 of the previously determined sequence. This varient retained essentially full biological activity as did the varients hpGRF(1-40)-NH.sub.2 and hpGRF(1-29)-NH.sub.2 (Spiess, J. et al., Biochemistry, 21:6037-6040 (1982)).
The amino acid sequence of human Insulin-like Growth Factor I has been previously determined (Rinderknecht, E. and R. E. Humble, J. Biol. Chem., 253:2769-2776 (1978)). This polypeptide, isolated from serum, is a single chain polypeptide of 70 amino acid residues which displays sequence homology to proinsulin. The chemical synthesis of a 70 amino acid residue polypeptide is inefficient and time consuming using current techniques.