Many biologically important molecules, in particular for use in therapy, are secreted proteins. For example, growth factors, interferons, erythropoietin, and insulin have been used successfully for treating various conditions and diseases.
Secreted proteins are characterized by the presence of a hydrophobic signal peptide at the amino terminus of the protein. The hydrophobic signal sequence is typically from about 16 to about 30 amino acids long and contains one or more positively charged amino acid residues near its N-terminus, followed by a continuous stretch of 6-12 hydrophobic residues. Signal peptides from various secreted proteins have otherwise no sequence homology. The presence of a hydrophobic signal peptide at the amino terminus of a protein mediates its association with the rough endoplasmic reticulumn (ER), which in turn mediates its secretion from the cell.
The mechanism by which peptides or proteins having a signal peptide associate with the endoplasmic reticulumn and are secreted is as follows. Protein synthesis begins on free ribosomes. When the elongating peptide is about 70 amino acids long, the signal peptide is recognized by a particle, termed a "signal recognition particle" or "SRP", which in turn is capable of interacting with a receptor, termed "SRP receptor", located on the ER. Thus, growing peptides having a signal peptide are targeted to the ER, where peptide synthesis continues on the rough ER. At some point during the protein synthesis or after the protein synthesis is completed, the protein is translocated across the ER membrane into the ER lumen, where the signal peptide is cleaved off. There the protein can be postranslationally modified, e.g., glycosylated. Whether posttranslationally modified or not, the protein can then be directed to the appropriate cellular compartment, e.g., secreted outside the cell.
Several systems have recently been developed to isolate nucleic acids encoding secreted proteins. One system which is used frequently and of which several variations exist is a system termed "Sequence Signal Trap". One such system (such as the Genetic Institute's DiscoverEase.TM. program) is yeast based and uses the yeast invertase gene, which cleaves the disaccharide sucrose into monosaccharides glucose and fructose. According to the system, a library of cDNAs is cloned upstream of the gene encoding invertase and yeast cells are selected on sucrose. Since yeast cells cannot ingest sucrose, but can ingest fructose and glucose, only yeasts secreting invertase are able to grow on sucrose. Thus, only yeasts which contain a cDNA containing a signal sequence properly fused to the invertase gene will permit invertase to be secreted and will survive on sucrose.
However, such systems have several drawbacks. For example, the sequence signal trap system requires that the sequence signal be fused properly, e.g., in frame, to the gene encoding the invertase. Even where the signal sequence is in frame, this may produce a fusion protein, which may be instable. Thus, only a fraction of the cDNAs containing a signal sequence will be fused properly to the invertase gene to permit secretion of the invertase gene. Furthermore, this system requires that the protein containing the signal sequence be secreted. However, it is known that numerous proteins containing signal peptides are trapped in the endoplasmic reticulumn. Accordingly, the requirement that the fusion protein containing the signal peptide be actually secreted further reduces the efficiency of cloning secreted proteins.
Thus, it is highly desirable to have a system for isolating nucleic acids encoding secreted proteins which is more efficient and reliable than the existing sequence signal trap systems.