In view of the current initiative to characterize the genomes of humans and other organisms, there is a need for methods of producing gene expression libraries representative of classes of genes of interest, in that non-specific libraries are associated with numerous inefficiencies. To date, gene expression libraries have generally been created using, as the specificity determining factor, nucleic acid selected as being representative of the desired class of genes.
For example, to prepare a tissue specific gene expression library, mRNA was prepared from the tissue of interest, converted into cDNA and inserted into an appropriate expression vector. To eliminate genes shared among tissues, the mRNA may have been annealed with cDNA from another tissue and the resulting hybrids removed prior to incorporation into the expression vector. The finished expression library could then be screened with an antibody directed toward the product of a specific gene of interest. Similar methods were used to prepare libraries using nucleic acid defined by other characteristics, for example, nucleic acids present in virally infected cells, or associated with a particular phase of the cell cycle. The problem with these techniques is that the selection techniques do not satisfactorily assure that non-specific genes are eliminated or, conversely, that class specific genes are not lost.
It would be of particular interest to produce a gene expression library enriched for secretory proteins. Most clinically significant proteins are secreted by their tissues of origin, and then exert their action at a distant location. The genes for a number of previously known secretory proteins have now been cloned and produced in a recombinant form, including, for example, growth hormone and other growth factors and cytokines, interferons, insulin, and erythropoietin. However, it is highly likely that a large number of clinically significant proteins remains to be identified.
It has long been known that the mRNAs for secretory proteins are associated, virtually exclusively, with membrane-bound polysomes contained in the "rough microsome" cellular fraction (Blobel and Dobberstein, 1975, J. Cell Biol. 67:835-851). The specific information for this association is contained within a "signal sequence" in the N-terminus of the nascent protein, which thus tags proteins for secretion or, in some cases, membrane insertion (Sabatini et al., 1982, Cold Spring Harbor Symp. Quant. Biol. 46:807-818). However, there is apparently no consensus amino acid sequence for the signal sequence contained in nascent secretory proteins. Accordingly, it has not been feasible to identify genes encoding secretory proteins by searching DNA sequence banks for DNA sequences that may represent signal sequences.
The purpose of the present invention is to provide a method for producing class-enriched gene expression libraries which defines the clonal members of the library not only by a nucleic acid selection procedure but also by antibody-based screening based on a functional attribute of the class of gene products. In preferred embodiments this method is used to create an expression library enriched for secretory proteins.