The advent of modern molecular biology and immunology has brought about the possibility of producing large quantities of biologically active materials in highly reproducible form and with low cost. Briefly, the gene sequence coding for a desired natural protein is isolated, replicated (cloned) and introduced into a foreign host such as a bacterium, a yeast (or other fungi) or a mammalian cell line in culture, with appropriate regulatory control signals. When the signals are activated, the gene is transcribed and translated, and expresses the desired protein. In this manner, such useful biologically active materials as hormones, enzymes or antibodies have been cloned and expressed in foreign hosts.
One of the problems with this approach is that it is limited by the "one gene, one polypeptide chain" principle of molecular biology. In other words, a genetic sequence codes for a single polypeptide chain. Many biologically active polypeptides, however, are aggregates of two or more chains. For example, antibodies are three-dimensional aggregates of two heavy and two light chains. In the same manner, large enzymes such as aspartate transcarbamylase, for example, are aggregates of six catalytic and six regulatory chains, these chains being different. In order to produce such complex materials by recombinant DNA technology in foreign hosts, it becomes necessary to clone and express a gene coding for each one of the different kinds of polypeptide chains. These genes can be expressed in separate hosts. The resulting polypeptide chains from each host would then have to be reaggregated and allowed to refold together in solution. Alternatively, the two or more genes coding for the two or more polypeptide chains of the aggregate could be expressed in the same host simultaneously, so that refolding and reassociation into the native structure with biological activity will occur after expression. The approach, however, necessitates expression of multiple genes, and as indicated, in some cases, in multiple and different hosts. These approaches have proven to be inefficient.
Even if the two or more genes are expressed in the same organism it is quite difficult to get them all expressed in the required amounts.
A classical example of multigene expression to form multimeric polypeptides is the expression by recombinant DNA technology of antibodies. Genes for heavy and light chains have been introduced into appropriate hosts and expressed, followed by reaggregation of these individual chains into functional antibody molecules (see, for example, Munro, Nature 312:597 (1984); Morrison, S. L., Science 229:1202' (1985); and Oi et al., BioTechniques 4:214 (1986); Wood et al. (Nature 314:446-449 (1985)).
Antibody molecules have two generally recognized regions, in each of the heavy and light chains. These regions are the so-called "variable" region which is responsible for binding to the specific antigen in question, and the so-called "constant" region which is responsible for biological effector responses such as complement binding, etc. The constant regions are not necessary for antigen binding. The constant regions have been separated from the antibody molecule, and biologically active (i.e., binding) variable regions have been obtained.
The variable regions of an antibody are composed of a light chain and a heavy chain. Light and heavy chain variable regions have been cloned and expressed in foreign hosts, and maintain their binding ability (Moore et al., European Patent Publication 0088994 (published Sep. 21, 1983)).
Further, it is by now well established that all antibodies of a certain class and their F.sub.ab fragments whose structures have been determined by X-ray crystallography, even when from different species, show closely similar variable regions despite large differences in the hypervariable segments. The immunoglobulin variable region seems to be tolerant toward mutations in the combining loops. Thereafter, other than in the hypervariable regions, most of the so-called "variable" regions of antibodies, which are defined by both heavy and light chains, are in fact quite constant in their three dimensional arrangement. See, for example, Huber, R. (Science 233:702-703 (1986)).
While the art has discussed the study of proteins in three dimensions, and has suggested modifying their architecture (see, for example, the article "Protein Architecture: Designing from the Ground Up" by Van Brunt, J., BioTechnology 4: 277-283 (April, 1986)), the problem of generating single chain structures from multiple chain structures, wherein the single chain structure will retain the three-dimensional architecture of the multiple chain aggregate, has not been satisfactorily addressed.
Given that methods for the preparation of genetic sequences, their replication, their linking to expression control regions, formation of vectors therewith and transformation of appropriate hosts are well understood techniques, it would indeed be greatly advantageous to be able to produce, by genetic engineering, single polypeptide chain binding proteins having the characteristics and binding ability of multi chain variable regions of antibody molecules.