Knowledge about protective antigens of a pathogen is currently considered a prerequisite for vaccine development. Selection of relevant antigens is accomplished by standard immunological techniques, such as immunization of animals, in vitro T- and B-cell stimulation assays, as well as biochemical studies. Several important vaccines have been constructed as a result of such investigations. However, a number of significant infectious diseases, such as tuberculosis, malaria, and HIV, are still awaiting development of effective vaccines.
Employing genomic libraries of a pathogen (to encode for a large collection of antigens) rather than analyzing in vitro expressed proteins, would ensure that the complete antigenic information of an organism is represented and would also overcome the problem that in vitro produced proteins can differ significantly from those synthesized in vivo. The use of genomic libraries to search for protective antigens, however, is hindered by the large number of constructs which have to be screened because there is no selection system which identifies well expressed open reading frames while discriminating against those fragments that contain non-coding DNA sequences or stop codons. DNA vaccination techniques allow to immunize animals directly with constructs containing genetic material of a pathogen, and to measure its protective value by challenging with the infectious agent. Using this approach, single antigens of Mycobacteria tuberculosis (the heat shock protein hsp65, the 36 kDa proline-rich antigen and the antigen 85 complex) have recently been tested, and shown to confer comparable levels of protection as Mycobacterium bovis-BCG (Tascon, et al., Nature Medicine, 2:888-892 (1996); Huygen, et al. Nature Medicine, 2:893-898 (1996)). Additionally, complete genomic expression libraries of Mycoplasma pulmonis have been constructed, and nucleic acid immunization employing such libraries proved to be protective in mice (Barry, et al., Nature, 377:632-635 (1995)). However, as powerful as this new strategy is, it is difficult to recover constructs from surviving animals and identify individual protective antigens.
Therefore, there remains a need for an improved method of identification of potential protective antigens of pathogenic organisms for which effective vaccines are not yet available. Current methods employ genomic libraries to search for protective antigens. A problem with libraries of randomly cloned genetic material is that they contain very little coding sequences. Accordingly, as there is currently no selection system for well expressed open reading frames, a great number of clones might have to be analyzed in order to find whole or partial open reading frames. Therefore, methods are needed for the selection and identification of whole and partial open reading frames from a genomic library in order to ultimately identify protective antigens.
The concept of the present invention is based upon the unique protein splicing properties of a novel class of genetic elements once referred to as "intervening protein sequences", now correctly called "inteins" (Perler, et al., Nucleic Acids Research, 22:1125-1127 (1994)). Inteins were first discovered in 1990 when one was found in a yeast gene (Kane, et al., Science, 250:651-657 (1990)); Hirata, et al., J. Biol. Chem., 265:6726-6733 (1990)). The investigators aligned the vacuolar ATPase VMA1 gene of Saccharomyces cerevisiae to similar genes from other organisms, and observed strong homology at both ends of the gene, but also found a large portion in the middle of the gene had very little homology to other ATPases. Furthermore, the S. cerevisiae VMA1 gene was much larger than any other known similar gene, while the gene produce was of the same size as other ATPases. After careful analysis of the transcription and translation process, the possibility of RNA splicing was ruled out, and the investigators concluded that protein splicing was occurring (Kane, et al., Science, 250:651-657 (1990)); Hirata, et al., J. Biol. Chem., 265:6726-6733 (1990)). Since then, more cases of protein splicing have been found, and currently about 10 to 15 inteins have been identified (reviewed in Clyman, ASM News, 61:344-347 (1995); Colston and Davis, Mol. Microbiol., 12:359-363 (1994)). Inteins could be demonstrated in eukaryotes, eubacteria and archae, thus spanning all three kingdoms. Very recently, the complete genome sequence of an archaeon, Methanococcus jannaschii, was published, and in the 38% of its genome which has homology to known genes from other organism, 18 inteins were identified, of which only 2 had been previously recognized (Bult, et al., Science, 273:1058-1073 (1996)). Inteins have very little homology to each other, making it hard to identify new members of this class of genetic elements in databases (Pietrokowsky, 1994). Basically every intein described so far has been discovered accidentally when the researchers tried to clone a gene and aligned it with already known sequences. Inteins can be defined as protein sequences which are embedded in frame within a precursor protein, and which are removed by protein splicing. During that process, the two terminal portions become ligated by a peptide bond, and form a fusion protein which is called the host protein or extein. The amino acids found at the two hexapeptide motifs on each end of the intein are crucial for the splicing process. These regions, which are also called splice sites, are extremely conserved in all inteins. The mechanism of protein splicing is not entirely understood, but involves several of these amino acids, particularly the C-terminal histidine, asparagine and cysteine/threonine/serine residues (Davis, et al., J. Bacteriol., 173:5653-5662 (1992); Hirata and Anraku, Biochem. Biophy. Res. Comm., 188:40-47 (1992); Hodges, et al., Nucleic Acids Research, 20:6153-6157 (1992); Cooper, et al., EMBO Journal, 12:2575-2583 (1993)). Splicing appears to be autocatalytical, and does not require any host cell cofactor, since inteins can splice out of their precursor proteins in a variety of in vivo and in vitro expression systems, including phosphate buffered saline (Davis, et al., J. Bacteriol., 173:5653-5662 (1992); Xu, et al., Cell, 75:1371-1377 (1993); and reviewed in Colston and Davis, Mol. Micobiol., 12:359-363 (1994)). Hallmarks of protein splicing are that most of the amino acids of the splice sites cannot be altered at the translation level, and that most deletions, stop codons, as well as any frame shifts within the intein are deleterious. Most inteins described so far possess a second characteristic pattern, as they show homology to the HO endonuclease motifs found in group 1 RNA introns, which is seen in the central part of the intein (reviewed in Belfort, et al., J. Bacteriol., 177:3897-3903 (1995)). In fact, it has been reported that several inteins display restriction endonuclease activity once they have spliced out of their host proteins (Shub and Goodrich-Blair, Cell, 71:183-186 (1992)). An actual homing function which is characteristic for group 1 RNA intron endonucleases has so far only been demonstrated for VMA1 intein of S. cerevisiae (Gimble and Thorner, Nature, 357:301-306 (1992); Gimble and Thorner, J. Biol. Chem., 268:21844-21853 (1993)).