Alkaliphilic Bacillus strains are in focus of industrial biotechnology. Alkaliphilic strains and especially enzymes of those species have great potential in biotechnical applications where enzymatic activity (or even maximum activity) at high pH values is required for biotechnical processes (Horikoshi: Alkaliphiles: Some Applications of Their Products for Biotechnology; Microbiology and Molecular Biology Reviews, Vol. 63, No. 4, 1999). Examples for alkaliphilic Bacillus strains as source for novel catalysts are B. clausii, B. pseudofirmus, B. clarkii, B. gibsonii. In the pursuit of novel enzymes it is also known to screen for such new enzymes by subjecting potential candidates to specific enzyme assays. This approach is limited to the availability of enzyme assays and does not allow the identification of functional enzymes or polypeptides for which the activity is still unknown.
Further, whole genome sequencing is a known method to obtain the information on all genes from a given microorganism e.g. as described in Fleischmann et al.; Whole genome sequences and assembly of Haemophilus influenzae Rd; Nature 269: 496-512; (1995).
Most enzymes for industrial use are enzymes which are secreted to the medium by a microorganism. However, only a few percent of a microorganisms' genome encodes secreted proteins. For example only approx. 4% of the Bacillus subtilis genome or its closest relatives encode secreted proteins (Van Dijl et al.: Protein transport pathways in Bacillus subtilis: a genome-based road map; in “Bacillus subtilis and its closest relatives—From genes to cells; p. 337-355; A. L. Sonenshein (ed.); ASM Press 2002).
One disadvantage of genome sequencing is that the vast majority of the obtained sequences encode non secreted proteins.
An additional disadvantage of genomic sequencing particular to eukaryotes such as fungi is that the genome size is many times larger than a bacterial genome making gene discovery by this method more costly and time consuming.                The random sequencing of cDNAs (Expressed sequence tags or ESTs) is another approach that allows for discovery of secreted proteins. In general, EST approaches suffer two drawbacks with regard to secreted protein identification; 1) Depending on the induction conditions used for the cDNA library sequenced, very few, typically between 0.5%-15% or even 1 and 5% of the cDNAs encode secreted proteins. 2) The clones all come from a cDNA pool derived from mRNAs that are present in the organism in proportion to the induction level of each particular gene.        
Also known is signal trapping which is a method to identify genes including nucleotides encoding a signal peptide using a translational fusion to an extra cellular reporter gene lacking its own signal (WO 01/77315).