The functionality of a protein is tightly connected with its structural properties, which are organized into different levels: the primary structure (corresponding to the amino acid sequence encoded by the gene), the secondary structure (preferred relative backbone locations of sequentially consecutive residues), tertiary structure (the relative position of all atoms of a polypeptide chain), and quaternary structure (the arrangement of different protein subunits, each corresponding to a distinct polypeptide chain, into a complex).
In addition to these levels of organization, a polypeptide of interest may correspond to a protein domain, which can be defined in two ways. From a structural perspective, a domain is a region of compactly folded polypeptide comprising one or more secondary and/or tertiary structures. Small proteins (<100 amino acids) are often composed of a single structural protein domain, while larger proteins usually consist of multiple structural protein domains. In the three-dimensional structure of a protein, a structural protein domain is seen as a unit of independently folded polypeptide that is well distinct from other parts of the protein. Associated with this structural definition, a second view of a protein domain is that it is the smallest piece of a protein that retains a function, resulting from its interaction with one or more proteins, nucleic acids, sugars, lipids, or any other organic or inorganic compound. A functional domain is generally composed of 50-350 amino acids and therefore may consist of one or more structural domains.
In nature, proteins can contain one or more functional protein domains, sequentially linked to each other and arranged tridimensionally. While the assignment of a particular amino acid sequence to a class of structural domains is possible by NMR or X-ray crystallography, the demonstration that a protein sequence is associated to a function is mostly obtained by identifying sequence homologies to known functional protein domains and/or by generating altered forms of the original protein to be tested in a relevant biochemical or biological assay in the so-called structure-activity or structure-function studies.
Many proteins share highly homologous functional and/or structural protein domains, probably as a consequence of a process called “exon shuffling”. According to this evolutionary theory, a certain number of genes have been created through a series of duplications, intronic recombinations, combinatorial assembly and mutations of existing exons coding for protein “modules” (Patthy L, Gene (1999), 238(1):103-114).
The idea that protein evolution in eukaryotes is also mediated by sexual intronic homologous recombination of parental genomes, creating new genes from the combination of exons associated to specific protein domains, is now widely accepted also because a series of studies demonstrate a relationship between the phase and the position of introns to the structural boundaries between protein domains (de Souza S J et al., Proc Natl Acad Sci USA. (1996), 93(25): 14632-6). In a significant number of situations it is evident that protein modules are associated to one or more exons limited by introns having phase zero (the intron does not interrupt a codon) or having always the same phase. In this way the exons coding for such “mobile” domains can be more easily combined into novel proteins composed by mosaic of such protein modules, without any problem of frame (de Souza S J et al., Proc Natl Acad Sci U S A. (1998), 95(9): 5094-9; Kolkman J A and Stemmer W P, Nat Biotechnol. (2001), 19(5): 423-8).
However, it has been also shown that a functional protein domain can exert biological activities clearly distinct from the one exerted in the context of the complete protein when it is physically separated from the rest of the primary translational product, that is, the protein obtained by directly translating the mRNA transcribed from a gene. Such functional protein domains are obtained in vivo following a proteolytic cleavage, which can be executed by an endopeptidase produced either by the cell encoding the primary translational product itself or by other cells, for example, when the primary translational product is secreted or exposed onto the cellular membrane. Such events are becoming more and more frequently characterized and it is evident that the functional protein domains so produced may have important physiological activities (Kiessling L L and Gordon E J, Chem. Biol. (1998), 5(3): R49-R62; Halim N S, The Scientist (2000), 1 (16): 20; Blobel C P, Curr. Opin. Cell Biol. (2000), 12(5): 606-612).
A number of commercially valuable Eukaryotic proteins correspond to these functional protein domains which, in a certain number of cases, are encoded by a subset of the coding exons making up the full gene.
An example is endostatin, an endogenous inhibitor of angiogenesis and tumor growth, which corresponds to the C-terminal proteolytic fragment of Collagen XVIII 1alpha, a collagenous protein of the extracellular matrix. Endostatin is encoded essentially by the 3 exons at the 3′ end of the Collagen XVIII 1alpha (COL18A1) gene but it is fully functional only after being released by proteolysis from Collagen XVIII.alpha-1, the primary translational product (O'Reilly M S et al., Cell (1997), 88(2): 277-285).
Another example is Tumor Necrosis Factor (TNF)-related activation-induced cytokine (also called TRANCE, RANKL, OPGL, or ODF), a type II transmembrane protein that is involved in the signaling pathway activating the rapid induction of genes that trigger osteoclast development. TRANCE is made as a membrane-anchored primary translational product, which is cleaved by the metalloprotease-disintegrin TNF-alpha convertase (TACE), generating a soluble TRANCE, a fully functional protein having potent dendritic cell survival and osteoclastogenic activity. The sequence of this soluble protein corresponds to the one encoded by the last 3 exons of the TRANCE gene (Lum L et al., J. Biol. Chem. (1999), 274(19): 13613-8).
For the commercial-scale production of a functional protein domain, there are two main alternatives, each having its drawbacks.
One could attempt to emulate nature, and produce in the first instance the primary translational product, which is subsequently proteolytically processed to obtain the desired protein.
To conduct this whole process by recombinant DNA technology is technically demanding. Not only the primary translational product has to be expressed, but also the proteolytic enzyme specific for the desired functional protein domain has to be identified, expressed and allowed to interact with the primary translational product, either in a cellular model or in an in vitro system, to ensure appropriate cleavage before further processing.
As a second approach, an expression construct, containing only the DNA coding sequence for the functional protein domain isolated from the original mRNA or gene, can be prepared. Even this commonly used technology requires a series of operations which may sensibly delay the development of a recombinant product (Makrides S C, Protein Expr. Purif (1999), 17(2): 183-202; Kaufman R J, Mol. Biotechnol. (2001), 16(2): 151-60). It is necessary to isolate the coding sequence of interest from the full cDNA sequence, which has to be modified so that it can be further subcloned into an expression vector containing all the transcription and translation control elements necessary for the correct expression in the host cell. The construct is then used to transform the host cell and, finally, a screening procedure is applied to the transformants to isolate clones expressing the exogenous protein, correctly and at high level.
The isolation of relevant clones can be time-consuming since ordinary expression vectors, in addition to the requirements listed previously, need to recombine with the genomic sequence of the host cell. Expression constructs maintained extrachromosomally are unstable and allow only a transient expression of the protein, usually insufficient to allow production on a commercial scale.
Therefore, a recombination event involving coding and non-coding exogenous DNA and host cell genomic DNA is necessary to transmit the expression construct to all cells generated by the subsequent cycles of DNA replication and mitotic division of the originally transformed cell. This process is also completely random and error-prone, since no specific features of the expression vector can drive the complete incorporation of the exogenous sequence into the host cell genome. Thus, any part of it, having even a very low homology with any endogenous sequence, can be used by the cell for non-homologous recombination events, which are known to be orders of magnitude more frequent than homologous recombination, often resulting in the incomplete integration of the necessary exogenous sequences.
These problems lead to the development of genetic enrichment and selection procedures whose purpose is to eliminate transformants where the expression construct has been integrated in an incomplete form. Since some essential parts of the exogenous sequence may have been lost or altered during the recombination process, the coding sequence of interest may be mutated, truncated, or not expressed at all. In any case, a vast majority of transformants, whatever technique is used to introduce the DNA and select the cells, fail to produce the expected protein.
Finally, it is well established in the literature that the correct expression of recombinant proteins in eukaryotic cells depends on many factors related to the specific host cell. Features like the toxicity of the recombinant protein, mRNA processing and stability, and other post-translational events are strictly related to the product itself, to both coding and non-coding sequence in the expression vector, and to the interaction of exogenous sequences with the genomic background of the host cells. In fact, the random insertion of a complete recombinant gene, which may contain many Kilobases of DNA, may severely perturb the genome of the host cell, compromising its stability and viability. Therefore, even if the exogenous sequences have been entirely integrated, some clones cannot be used to produce the protein whenever such sequences disrupt genomic sequences important for cell metabolism and/or replication. Such clones may be lost due to selective pressure in the cell culture, or they replicate so slowly that it is difficult to obtain enough cells stably expressing the desired protein with a satisfying efficiency.
Alternative strategies have been developed in order to minimize the drawbacks of essentially uncontrolled exogenous gene integration. They are mostly based on homologous recombination, a unique technology allowing insertion of a specific exogenous sequence into a predetermined genomic sequence of a mammalian cell. This technique has been mostly used for the characterization of genes or regulatory sequence in animal and cell models by modifying them, as recently reviewed (Muller U, Mech. Dev. (1999), 82(1-2): 3-21; Sedivy J M and Dutriaux A, Trends Genet. (1999), 15(3): 88-90), generating disrupted, non-functional or chimeric genes. Various vectors and selectable marker genes have been introduced, for example, in the genome of mouse Embryonic Stem (ES) cells, in order to study the effects of genetic alterations on various phenotypic features, like hormonal regulation, fertility, immunological response, organ development.
The feasibility of homologous recombination-based techniques for the production of recombinant proteins has been demonstrated at the level of entire primary translational products (WO 91/09955, WO 95/31560), which have been obtained by putting a complete endogenous gene under the control of exogenous transcriptional regulatory sequences. Alternatively, truncated proteins may be obtained by antisense oligonucleotides that, once transfected into a cell, can pair to the endogenous mRNA blocking the ribosome to go through the entire coding sequence (WO 97/23244). However, none of the documents in the literature suggests the use of homologous recombination for expressing selectively one or more exons of an endogenous target gene encoding a functional protein domain comprised in a primary translational product by integrating exogenous regulatory sequences.