The present invention relates to a novel method of identifying nucleic acid encoding secreted and membrane-bound proteins based upon the presence of signal sequences. The present invention also relates to a novel method for preparing cDNA libraries enriched for signal sequences.
Extracellular proteins are essential in the formation, differentiation and maintenance of multicellular organisms. The determination by individual cells of whether to live, proliferate, migrate, differentiate, interact with other cells or secrete are governed by information received from the cells neighbors and the immediate environment. This information is often transmitted by secreted polypeptides (e.g., mitogenic factors, survival factors, cytotoxic factors, differentiation factors, neuropeptides, and hormones) which are in turn received and interpreted by diverse cell receptors. These secreted polypeptides or signaling molecules normally pass through the cellular secretory pathway to reach their site of action in the extracellular environment.
The targeting of both secreted and transmembrane proteins to the secretory pathway is accomplished via the attachment of a short, amino-terminal sequence, known as the signal peptide or signal sequence. von Heijne, G. (1985) J. Mol. Biol. 184, 99-105; Kaiser, C. A. & Botstein, D. (1986), Mol. Cell. Biol. 6, 2382-2391. The signal peptide itself contains several elements necessary for optimal function, the most important of which is a hydrophobic component. Immediately preceding the hydrophobic sequence is often a basic amino acid or acids, whereas at the carboxyl-terminal end of the signal peptide are a pair of small, uncharged amino acids separated by a single intervening amino acid which defines the signal peptidase cleavage site. While the hydrophobic component, basic amino acid and peptidase cleavage site can usually be identified in the signal peptide of known secreted proteins, the high level of degeneracy within any one of these elements makes difficult the identification or isolation of secreted or transmembrane proteins solely by searching for signal peptides in DNA data bases (e.g. GeneBank, GenPept), or based upon hybridization with DNA probes designed to recognize cDNA's encoding signal peptides.
Secreted and membrane-bound cellular proteins have wide applicability in various industrial applications, including pharmaceuticals, diagnostics, biosensors and bioreactors. For example, most protein drugs commercially available at present, such as thromboyltic agents, interferons, interleukins, erythropoietins, colony stimulating factors, and various other cytokines are secretory proteins. Their receptors, which are membrane proteins, also have potential as therapeutic or diagnostic agents. Significant resources are presently being expended by both industry and academia to identify new native secreted proteins.
According to a screening method recently reported by Klein R. D. et al. (1996), Proc, Natl. Acad. Sci. 93, 7108-7113 and Jacobs (U.S. Pat. No. 5,536,637 issued Jul. 16, 1996), cDNAs encoding novel secreted and membrane-bound mammalian proteins are identified by detecting their secretory leader sequences using the yeast invertase gene as a reporter system. The enzyme invertase catalyzes the breakdown of sucrose to glucose and fructose as well as the breakdown of raffinose to sucrose and melibiose. The secreted form of invertase is required for the utilization of sucrose by yeast (Saccharomyces cerevisiae) so that yeast cells that are unable to produce secreted invertase grow poorly on media containing sucrose as the sole carbon and energy source. Both Klein R. D., supra, and Jacobs, supra, take advantage of the known ability of mammalian signal sequences to functionally replace the native signal sequence of yeast invertase. A mammalian cDNA library is ligated to a DNA encoding a nonsecreted yeast invertase, the ligated DNA is isolated and transformed into yeast cells that do not contain an invertase gene. Recombinants containing the nonsecreted yeast invertase gene ligated to a mammalian signal sequence are identified based upon their ability to grow on a medium containing only sucrose or only raffinose as the carbon source. The mammalian signal sequences identified are then used to screen a second, full-length cDNA library to isolate the full-length clones encoding the corresponding secreted proteins.
Given the great efforts presently being expended to discover novel secreted and transmembrane proteins as potential therapeutic agents, there is a great need for an improved system which can simply and efficiently identify the coding sequences of such proteins in mammalian recombinant DNA libraries. While effective, the invertase yeast selection process described above has several disadvantages. First, it requires the use of special yeast cells in which the SUC2 gene encoding the invertase protein has been deleted or the coding sequence of the native invertase signal has been mutated so that the invertase is not secreted. Second, even invertase-deficient yeast may grow on sucrose or raffinose, albeit at a low rate, therefore, the invertase selection may need to be repeated several times to improve the selection for transformants containing the signal-less yeast invertase gene ligated to a mammalian secretory leader sequence. See, Jacobs, supra. Third, the invertase selection process is further inadequate because a certain threshold level of enzyme activity needs to be secreted to allow growth. Although 0.6-1% of wild-type invertase secretion is sufficient for growth, certain mammalian signal sequences are not capable of functioning to yield even this relatively moderate level of secretion. Kaiser, C. A. et al. (1987), Science 235; 312-317. As a result, there still exists the need for an improved and simplified technique for selecting genes encoding signal sequence-containing (secreted or membrane-bound) polypeptides.