A typical bacterial genome is about 3–8 mB (mega base pairs) in size. For example, the Bacillus subtilis genome is known to have a size of 4.2 mB and to contain a total of 4100 protein coding genes. The function of 1200 gene products of Bacillus subtilis has been experimentally identified. The function of 42% of the genes could at the time the genome sequence was finished not be predicted by similarity to known genes encoding proteins with known function. These genes could be divided into three groups: 12% showed similarity to other genes with unknown function from other organisms, while 4% showed similarity to other genes with unknown function in B. subtilis only. The remaining 26% did not show homology to anything (F. Kunst et al. 1997. The complete genome sequence of the Gram-positive bacterium Bacillus subtilis. Nature 320:249–256).
Screening for enzymes, or other proteins or peptides, normally involves gene cloning in order to obtain a reasonable yield of a given gene encoding a desired product. Constructing a gene library, where the genome is cut into fragments, which are then ligated into a vector and transformed into a cloning host, does this. If the genome of B. subtilis is randomly cut into fragments with an average size of 4 kB (kilo base pairs), at least about 1000 clones must be screened in order to cover the entire genome once. In order to ensure that all open reading frames of the genome are represented in full length, a much higher number of clones must be screened in order to ensure that the entire genome is expressed. Usually the number of screened clones is on the order of magnitude of 5000–10,000 clones.
The genomes of Aspergillus nidulans and Neurospora crassa are known to have a size of 31.0 mB and 42.9 mB of DNA, respectively (Dunn-Colemann N. & Prade, R. 1998. Toward a global filamentous fungus genome sequencing effort. Nature Biotechnology, 16, 5; Radford and Parish, 1997. The genome and genes of Neurospora crassa. Fungal Genetics and Biology, 21, 258–266). The nuclear genome of Saccharomyces cerevisiae contains 13.0 mB, and about 6200 open reading frames have been predicted (Zagulski, M., Herbert C. J. & Rytka, J 1998. Sequencing and functional analysis of the yeast genome. Acta Biochimica Polonica, 45, 627–643).
A screening for enzymes in fungi can be based on an expression-cloning method, which combines the ability of Saccharomyces cerevisiae to express heterologous (fungal) genes with the utilization of enzyme assays. The fungus of interest is fermented under conditions that give high-level enzyme activity; mRNA is prepared from the resulting biomass and a cDNA library is constructed in E. coli. Plasmid DNA is isolated from subpools of this library and transformed into S. cerevisiae. Subsequently, the yeast transformants are screened for enzyme activity.
We assume that for a fungal genome about 5000 genes are expressed. For statistical reasons, and due to the manner in which cDNA is prepared, a high number of clones must be screened in order to ensure that all expressed enzymes are identified, i.e. on the order of magnitude of 50,000–100,000.
For a typical screen for any given enzyme or other gene product, a functional assay is applied: for example, proteases are screened in an assay specific for proteases, amylases are screened in an assay specific for amylases and so forth. The existing methods for traditional functional screening for extra-cellular enzymes are substantially limited to the applied screening assays. This means that screening of a genome provides a) only those enzymes for which a functional assay exists or can be designed, and b) only a single enzyme activity (or a very limited number of enzyme activities), i.e. the enzyme activity/activities that the assay is specific for or which can be derived from a single screening. Frequently, the same gene library is screened over and over again because it is desired to investigate several activities. This is ideally done in parallel, but as it is often not known at the outset which enzymes are of potential interest, gene libraries have to be newly constructed from the given wild type organism or the library has to be screened several times in the various functional assays. This method for screening for enzymes or other proteins has the disadvantage of being both time-consuming and expensive.
An estimate of the total number of extracellular enzymes in B. subtilis was made by 2D gel analysis of extracellular enzymes, and subsequent identification of spots by N-terminal sequencing. The number was predicted to be 150–180 extracellular enzymes (Hirose et al. 2000. Proteome analysis of Bacillus subtilis extracellular Proteins: a two dimensional protein electrophoretic study. Microbiology 146:65–75). This means that with a screening procedure designed to identify all secreted gene products, the number of hits would be about 200 clones from the total of 4100 open reading frames. In other words, if 10,000 clones are screened, 200–500 clones will carry a DNA fragment from the original genome expressing an extra-cellular protein or peptide. With a pre-screening for clones producing extra-cellular enzymes functional screening, should, in the ideal situation, be able to be performed on these less than 500 clones.
For fungi, the number of secreted gene products is assumed to be in the range of about 500–1000 for a given genome, so that only about 500–1000 clones from a total of approximately 25,000–40,000 screened clones are of real interest.
A tremendous savings in both time and money could be achieved by mining the gene libraries or cDNA libraries for clones expressing extracellular products. In a typical example of screening of a bacterial genome, the gene library could thus be initially screened in a secretion assay in which 5000–10,000 clones are screened and the approximately 200 clones are detected that encode secreted gene products. These 200 clones could then be screened using e.g. functional assays.
This means that compared to a theoretical screening procedure based only on functional assays in which a gene library might be screened in 200 different functional assays to detect all secreted gene products (e.g. 5000 clones×200=1,000,000 screened clones), in the ideal case in which clones producing secreted products may be initially identified, the gene library is screened once for secreted products (e.g. 5000 clones) and the resulting approximately 200 secreted clones can subsequently be fingerprinted for biochemical activity in functional assays. Assuming again use of the same 200 functional assays, a total of only 200=200=40,000 clones would have to be investigated, in other words only 4% of the 1,000,000 clones that would have to be investigated using a functional assay alone.
Typically, a gene library might be screened using about 10 different functional assays. With 5000 clones this gives a total of 50,000 clones that must be screened. In the ideal case in which clones producing secreted gene products can be identified at the outset, the 5000 clones are screened once, after which all secreting clones are detected and analysed in the 10 functional assays, corresponding to a screening workload of 200 clones screened in 10 assays, i.e. only 2000 clones need to be screened in the functional assays, again a total of only 4% of the number of clones that must be screened using the functional assays alone.
For fungal cDNA libraries, the same statistical considerations apply.
In short, it would be a tremendous advantage to have a screening assay for secreted enzymes and other proteins from gene libraries wherein the relatively few clones producing secreted gene products could be identified at the outset, so that only these few clones have to be investigated in functional assays aimed at identifying proteins of interest. The present invention provides such an assay.
Prior art includes disclosures such as WO 89/08114 and Analytical Biochemistry, Vol. 270 (1) pp. 103–111 (1999), which relates to the use of specific monoclonal antibodies to identify compounds such as expression product, e.g. surface bound antigenic products. The prior art techniques do however not pertain to solving the problem of identifying unknown compounds for which no suitable antibody can be identified.