The present in invention relates, generally, to the field of expressed gene technology and expression cloning. More specifically, the present invention relates to the identification, characization, and isolation of transcribed nucleic acid sequences encoding polypeptides having a predetermined property, e.g., cellular localization, structure, enzymatic function, or affinity to other molecules, and the production of the corresponding polypeptides.
General Background. Proteins are the most prominent biomolecules in living organisms; in addition to their role as structural components and catalysts, they play a crucial role in regulatory processes. Both regulation of cell proliferation and metabolic functions are largely controlled and effected by the cooperation of numerous cellular and extracellular proteins. Lehninger, A. L., 1975, xe2x80x9cBiochemistryxe2x80x9d, Worth Publishers Inc., New York, N.Y. For example, signal transduction pathways of many kinds that affect critical physiological responses operate through proteins by way of their intermolecular interactions. Metzler, D. E., 1977, xe2x80x9cBiochemistryxe2x80x9d, Academic Press Inc., London. Furthermore, the transcription of genes and the regulation of such transcription is dependent upon and controlled by the interdependence of numerous protein factors. Wainwright, S. D., xe2x80x9cControl Mechanisms and Protein Synthesisxe2x80x9d, Columbia University Press, New York and London.
Proper functioning of a multicellular organism does not only depend on the interaction of biomolecules within the cell, but individual cells must also communicate appropriately. Such intercellular communication, and interaction of cells with the environment is often realized by the actions of receptors on the extracellular surface and associated intracellular signal transduction mechanisms. Poste, G., Nicholson, G. L., 1976, xe2x80x9cThe Cell Surfacexe2x80x9d, Elseviere, Amsterdam. The information is communicated through the cell environment to regulate gene expression or protein activities in the cell. Secreted proteins in the extracellular environment thereby exert potent regulatory effects on certain cellular functions.
In view of the above outlined, very simplified paradigm of cell function, particular properties of a protein, including cellular localization, structure, affinity to a binding partner, or enzymatic activity under physiological conditions appear to be highly indicative of its type of function. With respect to a particular cellular localization, secreted proteins, for example, are likely to function as intercellular communicators of signals, while membrane associated receptors having an extracellular and intracellular domain most likely transmit an extracellular signal into the cell. Cytoplasmic proteins may function as intracellular signal transmitters and coordinators. Jeter, J. R., Cameron, I. L., Padilla, I. L., Padilla, G. M., Zimmerman, A. M., 1978, xe2x80x9cCell Cycle Regulationxe2x80x9d; London. Nuclear proteins are likely to be involved in certain aspects of gene regulation. Zawel et al., 1995, Annu. Rev. Biochem. 64:533-561. Mature proteins found in the Golgi or the ER may have regulatory roles in the post-translational processing of protein precursors, e.g., cleavage or addition of carbohydrates. Hirschberg, 1987, C. Annu. Rev. Biochem. 56:63-87.
Membrane-Associated Proteins. For many years, the paradigm of cell function has motivated numerous drug discovery programs to focus on identifying membrane-associated proteins, in particular new receptors, and their respective functions. Porter, R. and O""Connor, M., 1970, xe2x80x9cMolecular Properties of Drug Receptors, Ciba Foundation Symposiumxe2x80x9d, JandA Churchill, London. Many examples in fact compel the conclusion that improper function of membrane receptors is a significant source of the development of serious metabolic and proliferative diseases such as cancer. For example, a certain form of Diabetes mellitus, i.e., the non-insulin-dependent diabetes (NIDDM) may be caused by mutations in the insulin receptor. Ullrich et al., 1985, Nature 313:756-761; Taira et al., 1989, Science 245:63-66. Furthermore, 30% of all mammary carcinomas are associated with amplification of the receptor tyrosine kinase HER2. Bargman et al., 1986, Cell 45:649-657; Slamon et al., 1989, Science 244:707-712. In addition to traditional drug discovery programs targeting receptors, an ambitiously pursued objective has become to identify membrane-associated receptors as possible gene therapy targets using comparative genomics, which allows. determination of changes in gene expression under, e.g., pathological conditions. Wels, et al., 1995, Gene 159(1):73-80.
Secreted Proteins. While receptors have mostly been considered as important potential therapeutic targets, secreted proteins are of particular interest as potential therapeutic agents. They often have a signalling or hormone function, and hence have a high and specific biological activity. Schoen, F. J., 1994, xe2x80x9cRobbins Pathologic Basis of Diseasexe2x80x9d,. W.B. Saunders Company, Philadelphia. For example, secreted proteins control physiological reactions such as differentiation and proliferation, blood clotting and thrombolysis, somatic growth and cell death, and immune response. Schoen, F. J., 1994, xe2x80x9cRobbins Pathologic Basis of Diseasexe2x80x9d, W. B. Saunders Company, Philadelphia.
Significant resources and research efforts have been expended for the discovery and investigation of new secreted proteins controlling biological functions. Many of such secreted proteins, including cytokines and peptide hormones, are manufactured and used as therapeutic agents. Zavyalov et al., 1997, APMIS 105(3):161-186. However, of the several thousand expected secreted proteins, only a few are currently used as therapeutic compounds. It can be expected that many of the so far undiscovered secreted proteins of the human organism are effective in correcting physiological disorders and are thus promising candidates for new drugs.
In the past, novel cytokines and hormone proteins were identified by assaying a certain cell type for its response to protein fractions or purified proteins. Lauffenburger et al., 1996, Biotechnology and Bioengineering 52(1):61-80. Other investigators have used sequence similarities on DNA level to clone novel interferons and interleukins. Nabori et al., 1992, Analyt. Biochem. 205(1):42-46. In again another approach, differential display techniques were used to compare the expression patterns of stimulated versus unstimulated cells. Nagata et al., 1980, Nature 287:401-408. All these methods may yield identification and isolation of certain secreted polypeptides.
Recently, a screening method for the identification of cDNA encoding novel secreted mammalian proteins in yeast using the invertase gene as a selection marker has been described. See, U.S. Pat. No. 5,536,637 (the xe2x80x9c""637 Patentxe2x80x9d). The disclosed technology relies on the concept that leader sequences of mammalian cDNAs are effective in exporting the invertase protein depleted of its leader sequence. This approach yields partial cDNAs which in turn can be used to screen a full-length cDNA library. The novel protein of interest can then be manufactured by standard, but laborious, techniques, including subcloning, transforming a recombinant host, expression, development and implementation of a purification process. Furthermore, since the assays described in the ""637 Patent are performed in yeast, the glycosylation pattern of the isolated products will differ significantly from the natural product produced in mammalian cells. This difference is a major impediment in view of the fact that an extremely important feature of secreted proteins (as it is true for the extracellular domain of receptors) is their glycosylation pattern and carbohydrate composition. Rademacher et al., 1988, Annu. Rev. Biochem. 57:785-838.
Nuclear Proteins. In the nucleus, both replication of DNA and transcription of genes is actually implemented. Many nuclear proteins are directly involved in these processes as transcription factors, as cell cycle regulators, or both. Some nuclear proteins are responsible for turning on expression of certain metabolic proteins in response to environmental changes. Zawel et al., 1995, Annu. Rev. Biochem. 64:533-561. Many others are directly involved in the regulation of cell proliferation. Jeter, J. R., Cameron, I. L., Padilla, G. M., Zimmerman, A. M., 1978, xe2x80x9cCell Cycle Regulationxe2x80x9d, London. Proteins in this latter class fall into two general categories: (1) dominant transforming genes, including oncogenes; and (2) recessive cell proliferation genes, including tumor suppressor genes and genes encoding products involved in programmed cell death (xe2x80x9capoptosisxe2x80x9d).
Oncogenes generally encode proteins that are associated with the promotion of cell growth. Because cell division is a crucial part of normal tissue development and continues to play an important role in tissue regeneration, properly regulated oncogene activity is essential for the survival of the organism. However, inappropriate expression or improperly controlled activation of oncogenes may drive uncontrolled cell proliferation and result in the development of severe diseases, such as cancer. Weinberg, 1994, CA Cancer J. Clin. 44:160-170.
Tumor suppressor genes, on the other hand, normally act as xe2x80x9cbrakesxe2x80x9d on cell proliferation, thus opposing the activity of oncogenes. Accordingly, inactivation of tumor suppressor genes, e.g., through mutations or the removal of their growth inhibitory effects may result in the loss of growth control, and cell proliferative diseases such as cancer may develop. Weinberg, 1994, CA Cancer J. Clin. 44:160-170.
Related to tumor suppressor genes are genes whose product is involved in the control of apoptosis; rather than regulating proliferation of cells, they influence the survival of cells in the body. In normal cells, surveillance systems are believed to ensure that the growth regulatory mechanisms are intact; if abnormalities are detected, the surveillance system switches on a suicide program that culminates in apoptosis.
Several genes that are involved in the process of apoptosis have been described. See, for example, Collins and Lopez Rivas, 1993, TIBS 18:307-308; Martin et al., 1994, TIBS 19:26-30. Cells that are resistant to apoptosis have an advantage over normal cells, and tend to outgrow their normal counterparts and dominate the tissue. As a consequence, inactivation of genes involved in apoptosis may result in the progression of tumors, and, in fact, is an important step in tumorigenesis.
Accordingly, the identification of nuclear proteins addresses two areas of interest. First, oncogenes are prone to be valuable targets for the development of highly specific drugs for the treatment of cancer. Secondly, tumor suppressors and apoptosis inducing proteins can be useful as agents for the treatment of cancer.
Other Cellular Localizations. Also many of the remaining cellular localizations are associated with particular functions. For example, most metabolic enzymes are located in the mitochondria. Lehninger, A. L., 1975, xe2x80x9cBiochemistryxe2x80x9d, Worth Publishers Inc., New York. Thus, mitochondrial proteins could reveal targets for the treatment of metabolic diseases. The Golgi apparatus and the ER are associated with post-translational processing of proteins; such processes are valuable targets for the treatment of diseases related to protein folding and glycosylation. Rademacher et al., 1988, Ann. Rev. Biochem. 57:785-838.
Enzymatic Activities. A particular enzymatic activity can be indicative of a protein""s function. For example, kinases are frequently involved in signal transduction processes.
Structures. Protein is also indicative of protein function. Indeed, proteins with similar structures frequently share certain functional properties. Thus, identifying proteins having a structure related to that of a known protein having a particular function of interest can reveal additional proteins having such function.
Generally, a method would be desirable which allows one to pre-sort proteins according to a property of interest, e.g., localization in the cell, affinity to binding partners, enzymatic activity, structure and the like. Such a method would allow one to generate libraries of, e.g., secreted proteins, such as cytokines, membrane associated proteins, such as receptors, nuclear proteins, such as transcription factors, mitochondrial proteins, such as respiratory proteins, and so on. Further, it would be desirable to perform screening, isolation and production of any product in mammalian cells in order to achieve the proper glycosylation pattern. Finally, the most preferred method would additionally allow one to identify and isolate proteins of interest per se rather than a partial DNA sequence.
The present invention addresses this need. The present invention provides methods and expression systems for the generation of expression libraries encoding polypeptides of a predetermined property, including but not limited to cellular localization, structure, enzymatic function, or affinity to other molecules. The methods and expression systems of the present invention allow one to identify and isolate nucleic acids encoding novel proteins of interest. The methods and expression systems furthermore provide a powerful system for the identification of thus far unelucidated receptor/ligand relationships. Since the methods provided can employed in a wide variety of host cell systems, including mammalian systems, they provide for expression products having an appropriate carbohydrate composition.
The present invention relates, generally, to the field of expressed gene technology and expression cloning. More specifically, the present invention relates to the identification, characterization, and isolation of transcribed nucleic acid sequences encoding polypeptides having a predetermined property, including, but not limited to, cellular localization, structure, enzymatic function, or affinity to other molecules and the production of the corresponding polypeptides.
More specifically, the invention is directed to a method for identifying nucleic acids encoding proteins with a predetermined property of interest. Such a property may include a particular cellular localization, structure, enzymatic function, or affinity to other molecules. In one embodiment of the invention, a plurality of eukaryotic host cells is provided, wherein each host cell has an expression system comprising a different member, each member comprising a recombinant nucleic acid encoding an exogenous protein operatively linked to a control element. In a second step, the eukaryotic host cells are cultured under conditions where the exogenous protein is expressed while expression of endogenous proteins of the eukaryotic host cell is suppressed. In this time window, the exogenous protein may optionally be labelled, or may be treated in a way that allows discrimination from the untreated exogenous proteins. Finally, the member or members of the expression system that encode the exogenous protein or proteins having the property of interest are identified.
In another aspect of the invention, a method for identifying a recombinant nucleic acid encoding an exogenous protein having a property of interest is achieved by providing a plurality of eukaryotic host cells, wherein each host cell has an expression system comprising a different member, and each member comprises a recombinant virus having a recombinant nucleic acid encoding an exogenous protein operatively linked to a control element. The eukaryotic host cells are cultured to express the exogenous proteins, and expression systems expressing recombinant nucleic acid encoding an exogenous protein having the property of interest are identified. Optionally, the expression systems are capable of expressing exogenous proteins while endogenous protein production of the eukaryotic host cell is suppressed. Although any kind of recombinant eukaryotic virus can be used, particularly advantageous viruses are alpha viruses. In addition, the exogenous proteins can be preferentially labelled or distinguished from endogenous proteins. The recombinant virus may be capable of directing the generation of viral particles and replicating, or the recombinant virus may lack functions required for propagation.
Also encompassed within the invention are methods for generating genetic expression libraries encoding proteins having a predetermined property of interest. In one aspect, such methods entail providing a plurality of eukaryotic host cells, wherein each host cell has an expression system comprising a different member, each member having a recombinant nucleic acid encoding an exogenous protein operatively linked to a control element, culturing the eukaryotic host cells under conditions where said exogenous protein is expressed while expression of endogenous proteins of said eukaryotic host cell is suppressed, and identifying the members that express recombinant nucleic acids encoding exogenous proteins having the property of interest. In another aspect, the methods entail providing a plurality of eukaryotic host cells, wherein each host cell has an expression system comprising a different member, each member being a recombinant virus having a recombinant nucleic acid encoding an exogenous protein operatively linked to a control element, culturing the eukaryotic host cells to express the exogenous proteins, and identifying the members that express recombinant nucleic acids encoding exogenous proteins having the property of interest. The invention further includes libraries of proteins identified using such methods. Additionally, the invention encompasses nucleic acid libraries having a population of eukaryotic expression systems with a plurality of members, each member having a recombinant nucleic acid encoding an exogenous protein operatively linked to a control element for expression in eukaryotic host cells. In one embodiment, the control element directs the expression of the exogenous proteins while expression of endogenous proteins in the eukaryotic host cells are suppressed. In yet another embodiment, the control element is derived from a eukaryotic virus.
FIG. 1 depicts an experiment in which expression systems containing either a nucleic acid encoding a secreted protein or a nucleic acid encoding an intracellular protein from a mixture of nucleic acids were identified using compartment screening. FIG. 1A is an autoradiogram of labelled proteins precipitated from the supernatants of the screened cell cultures, as described in more detail in Example 1. Lanes are as follows: lane 1-non-infected BHK 21 cells; lane 2-cells transfected with an expression system encoding an intracellular protein, pSinRep5 lacZ; lane 6-cells transfected with an expression system encoding a secreted protein (EPO), pSinRep 5 EPO; lanes 3 to 5 contain mixtures of the 2 expression systems in the ratios 90:10, 50:50, 10:90 (sinRep5 lacZ:SinRep5 EPO) showing increasing amounts of EPO. Protein mass standard is shown on the left side; the molecular weight of EPO is indicated by the arrow. FIG. 1B is an autoradiogram of the labelled proteins in the corresponding cell pellets from lanes 1 and 2 of FIG. 1A, showing the accumulation of lac Z protein in the cell pellet, and the shutdown of endogenous protein production, in cells infected with pSinRep5 lacZ.
FIG. 2 depicts separation of labelled viral particles from secreted protein, as described in more detail below in Example 2. Shut-down of endogenous protein synthesis is apparent in lanes 4 to 6 (infection with TE 5xe2x80x22J CAT) and in lanes 7 to 9 (infection with TE 5xe2x80x22J EPO) as compared to lanes 1 to 3 (non-infected BHK 21 cells). Removal of viral particles is demonstrated by the absence of the characteristic pattern of Sindbis structural proteins (capsid, E1 and E2). Lane 9 shows diffusion of a protein of the size of EPO through the agarose. In lanes 4 to 9 a soluble protein of viral origin can be seen. It is assumed that this protein is released by proteolytic cleavage. Labelled protein was collected at different time points: 2 h (lanes 1,4,7), 4 h (lanes 2,5,8) and 8 h (lanes 3,6,9). Protein mass standard is shown on the left side, the size of Sindbis viral glycoproteins is indicated by the upper arrow, the molecular weight of EPO is indicated by the lower arrow.
FIG. 3 demonstrates that release of viral soluble proteins is reduced by protease inhibitors. As described below in Example 3, 106 BHK 21 cells in a 35 mm dish were infected with TE 5xe2x80x22J CAT. Varying concentrations of the Protease inhibitor cocktail (100, 20, 10, 5, 1 xcexcl per ml, lanes 1 to 5) were applied in the 4 mm agarose overlay. The molecular mass standard is shown on the left; the arrow indicates the expected mass of Sindbis glycoprotein E1.
FIGS. 4A-4B show identification of an expression system containing a nucleic acid encoding a protein with a predetermined enzymatic activity in a semi-solid medium screening. Confluent 35 mm dishes of BHK 21 cells were infected with 200 pfs SinRep lacZ (FIG. 4A) or a mixture of 200 pfs SinRep lacZ/20 pfu of SinRep 5 SEAP (FIG. 4B). Enzymatic activity was detected on the filters with AP staining. Blot 1 (FIG. 4A) (containing only a lacZ expression system) is negative in SEAP activity whereas blot 2 (FIG. 4B) shows 2 distinct areas with SEAP activity (indicated by arrows).
FIGS. 5A-5D depict the pSIN vectors used to illustrate the invention. The source and construction of these vectors is described in Table I.
FIGS. 6A-6D are schematic representations of the pTE vectors used to illustrate the invention. Vectors shown are pTE (FIG. 6A); pTE CAT (FIG. 6B); pTE SEAP (FIG. 6C); and pTE EPO (FIG. 6D). The source and construction of these vectors is described in Table I.
FIGS. 7A and 7B depict approximately 20 pfu of pSinRep5 SEAP and 780 pfu pSinRep5 LacZ were mixed in 1 ml Turbodoma HP-1 and a 60 mm dish with BHK 21 cells was infected for 2 hours. An agarose blot assay was performed as described in Example 10. FIG. 7A shows the AP stained nitrocellulose membrane, blotted SEAP protein is represented by the violet spots of the developed X-ray film (FIG. 7B, the AP stained membrane exposed to the X-ray film) where the black spots represent labeled secreted protein. Coordinates (indicated by arrows x1,x2,x3 in the AP stained membrane and the corresponding arrows y1, y2, y3 in the developed X-ray film) of the spots with SEAP activity can be superimposed with the spots of labeled secreted proteins (compare x1 with y 1, x2 with y2, x3 with y3).
FIGS. 8A, 8B and 8C depict pSinRep 5 SEAP mRNA and pDHEB ts mutant mRNAs were co-electroporated into BHK 21 cells. The cell supernatant of the electroporated cells was analyzed 20 hours post-electroporation by spotting 4 xcexcl supernatant on a nitrocellulose strip and AP staining was done as described before. All the electroporations were positive for SEAP secretion (violet spots on nitrocellulose filter) as shown in FIG. 8A. The first passage of the ts mutant viruses was tested for infectivity at 37xc2x0 C. and at 30xc2x0 C. The supernatants of the infected cells were tested by AP staining for the secreted product SEAP, the upper row in FIG. 8B represents the supernatant of cells incubated at 37xc2x0 C., the lower row of FIG. 8B represents the supernatant of cells incubated at 30xc2x0 C. The double mutant pSinRep 5 SEAP/pDHEB ts2,20 produced 20 hours post-infection a low amount of SEAP (see FIG. 8B) but virus particles were amplified and a high amount of product was detected after 48 hours post-infection, as shown in FIG. 8C.).
FIGS. 9A, 9B, 9C and 9D depict pSinRep 5 hIL13 R alpha-infected BHK 21 cells (at an moi of approximately 0.1) were analyzed by Immunofluorescence. Expressed hIL13 R alpha was detected with IL13-flag, M2 antibody and antiMouse-FITC. FIG. 9A: BHK 21 infected at an moi of 0.1 with pSinRep 5 hIL13 R alpha, analyzed and sorted with FACS; FIG. 9B: the same cells as in FIG. 9A with Immunfluorescence Microscopy; FIG. 9C: pSinRep 5 LacZ infected cells as negative control; FIG. 9D:1:100 diluted pSinRep 5 hIl13 R alpha virus in pSinRep 5 LacZ virus analyzed and sorted with FACS.
FIG. 10 depicts approximately 20 pfu of pSinRep 5 Epo/DHEB were mixed with 200 pfu of pSinRep 5 LacZ. The virus supernatant was incubated for 2 hours on 90% confluent BHK 21 cells in a 60 mm dish, before the supernatant was replaced with 3 ml of 0.8% 41xc2x0 C. warm agarose in 1xc3x97HP-1 medium. Two days later, a nitrocellulose membrane was applied on the agarose and diffusion blotting was proceeded for 14 hours. Blotted EPO was detected by immuno-detection with anti-EPO antibody (rabbit) and AP-conjugated anti-rabbit antibody. The violet spots d1,d2,d3 (and the other dark spots) represent the blotted EPO derived from the underlying virus plaque representing pSinRep5 EPO virus.
FIGS. 11A-11M depict the polynucleotide sequences of pSinRep 5, pSinRep 5 EPO, pSinRep 5 hIL 13 Ralpha, and PDH-EB.
FIGS. 12A-12D depict the polynucleotide sequence of pTE5xe2x80x22J (SEQ ID NO:1).
FIGS. 13A-13C depict the-polynucleotide sequence of 987 BB neo (SEQ ID NO:2).
FIG. 14 depicts the polynucleotide sequence of CAT (SEQ ID NO:7).
FIG. 15 depicts the polynucleotide sequence of the XbaI/ApaI fragment of synthetic erythropoietin (SEQ ID NO:8).