The present invention is directed to a method of selection of purified nucleotidic sequences or polynucleotides encoding proteins or part of proteins carrying at least an essential function for the survival or the virulence of mycobacterium species by a comparative genomic analysis of the sequence of the genome of M. tuberculosis aligned on the genome sequence of M. leprae. The selection by the method of the invention of these or peptidic sequences of interest which are encoding said essential functions of mycobacterium leads to identify and characterize specific antigens or regulator sequences, said antigens being chosen as potential candidates for an immunogenic or vaccine composition, but also useful to determine novel potential drug targets for the pharmaceutical industry. The molecules having essential functions encoded by these genes or corresponding to regulatory elements represent also new highly specific targets for chemotherapy. The sequence of the polynucleotides according to the invention have the particularity to be maintained during the evolution of the mycobacterium and therefore have been highly conserved in pathogenic mycobacterium species. The invention is directed to purified nucleic acid selected by the method of the invention as well as the purified polypeptides with essential functions for the survival or the virulence of mycobacterium species encoded by these sequences. In a preferred embodiment, the invention is directed to genes that code for essential proteins for which the functions have been attributed. The invention is also directed to a process for the production of recombinant polypeptides and chimeric polypeptides comprising them, antibodies generated against these polypeptides, immunogenic or vaccine compositions comprising at least one polypeptide useful as protective antigens or capable to induce a protective response in vivo or in vitro against mycobacterium infections, immunotherapeutic compositions comprising at least such a polypeptide according to the invention, and the use of such nucleic acids and polypeptides in diagnostic methods, vaccines, kits, or therapy.
To illustrate the new approach of comparative genomics for identifying essential molecules as regulator nucleotidic sequences and proteins for the survival or the virulence of mycobacterium species, the inventors made several examples which will not limit the scope of the present invention. A comparative genomic analysis, which permitted the inventors to select the sequences encoding essential molecules as regulatory nucleotidic sequences and proteins for the survival or the virulence of mycobacterium species, has been made by analysis of the complete genome sequence of both Mycobacterium tuberculosis and Mycobacterium leprae. The whole genome comparisons led also to the identification of genes that are present in both M. tuberculosis and M. leprae but have no counterparts elsewhere. The polypeptides having essential functions for the survival or the virulence mycobacterium species are characterized by at least 40% identity at the protein level and at least 70% identity at the gene level between both genomic sequences. The amino acid sequences have been compared using the program GAP,“GCG” (Genetic Computer Group) from Program Manual Wisconsin Sequence Analysis Algorithm of Needleman and Wunsch.
(J. Mol. Biol. 48:443, 1970) The parameters are chosen as follows.
For amino acid comparisons:
    Gap penalty: 5    Gap extension penalty: 0.30    Length: the sequence to be compared are the following XXX SEQ ID NO:XXX and having XXX amino acids.For nucleotide comparisons:    Gap penalty: 50    Gap extension penalty: 3Also the parameters could be adapted case by case.
Other techniques are known by the man of the art for the comparison of sequences. We can refer to the algorithm of Smith and Wateman (Ad. App. Math. 2: 482, 1982), the method of search of similarities of Pearson and Lipman (Proc. Natl. Acad. Sci. USA 85:2444, 1988), Zhang et al. “A greedy algorithm for aligning DNA sequences” (J. Comp. Biol., 2000, February-April 7(1-2), p. 203-214), these algorithms are used by the way of informatic tools (GAP, BLASTP, BLASTN, BLASTX, BESTFIT, FASTA and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group, 575 Sciences Dr., Madison Wis.).
The recombinant clones carrying DNA from Mycobacterium tuberculosis and Mycobacterium leprae strains containing genomic sequences of said bacteria, have been deposited at the Collection Nationale de Cultures de Microoganismes (C. N. C. M.), of Institut Pasteur, 28, rue du Docteur Roux, F-75724 Paris cedex 15, France, and are designated as following.                HRV37 genomic library, deposited on Nov. 19, 1997 under the accession number I-1945.        A recombinant BAC containing a fragment of M. tuberculosis genome deposited on Feb. 20, 2001, at the C. N. C. M. under the accession number I-2625.        A recombinant BAC containing a fragment of M. tuberculosis genome deposited on Feb. 20, 2001, at the C. N. C. M. under the accession number I-2626.        A recombinant BAC containing a fragment of M. tuberculosis genome deposited on Feb. 20, 2001, at the C. N. C. M. under the accession number I-2627.        A recombinant BAC containing a fragment of M. tuberculosis genome deposited on Feb. 20, 2001, at the C. N. C. M. under the accession number I-2628.        A recombinant BAC containing a fragment of M. tuberculosis genome deposited on Feb. 20, 2001, at the C. N. C. M. under the accession number I-2629.        A recombinant cosmid containing a fragment of M. leprae genome deposited on Feb. 21, 2001, at the C. N. C. M. under the accession I-2632.        A recombinant cosmid containing a fragment of M. leprae genome deposited on Feb. 21, 2001, at the C. N. C. M. under the accession I-26330.Leprosy, one of the oldest recorded diseases, remains a major public health problem. Although prevalence has been reduced extensively by WHO multidrug therapy and vaccination with BCG (Anon, Karonga Prevention Trial Group, Lancet, 348, 17-24, 1996; Nordeen, S. K., et al., eds. Walgate, R. & Simpson, K., 47-55, World Health Organisation, Geneva, 1993, the incidence of the disease remains worrying with more than 690,000 new cases annually (Anon, WHO Weekly Epidemiological Record, 73, 40, 1998) in the world. Leprosy was common in Europe in the middle ages but gradually disappeared.        
In 1873, in the first convincing association of a microorganism with a human disease, (Hansen, G. H. A., Norsk Magazin for Laegervidenskaben (supplement), 4, 1-88, 1874) discovered the leprosy bacillus in skin biopsies but failed to culture Mycobacterium leprae. A century later, the nine-banded armadillo (Kirchheimer, W. K. et Int. J. Lep., 39, 693-702, 1971) was used as a surrogate host, enabling large quantities of the bacillus to be isolated for biochemical and physiological studies. Subsequent efforts to demonstrate multiplication in synthetic media have been equally fruitless although metabolic activity can be detected S., Antimicrobial Agents and Chemotherapy, 33, 2115-2117, 1989). The exceptionally slow growth of the bacillus, which has a doubling time days (Shephard, C. C., eds. Hastings, R. C., 269-286, Churchill Livingstone, Edinburgh, 1985), may contribute to these failures.
The means of transmission of leprosy is uncertain but, like tuberculosis, it is believed to be spread by the respiratory route since lepromatous patients harbour bacilli in their nasal passages. The bacterium accumulates principally in the extremities of the body where it resides with macrophages and infects the Schwann cells of the peripheral nervous system. Lack of myelin production by infected Schwann cells, and their destruction by host-mediated immune reactions, leads to nerve damage, sensory loss and the disfiguration that, sadly, are hallmarks of leprosy.
There is no data or technical information in the prior art which permit to select specifically potential new targets and protective antigens for new drugs and vaccine compositions to treat and prevent mycobacterial diseases, particularly tuberculosis and leprosy. Furthermore, there is a need for the development of new tools for the selection of genes which are encoding for essential proteins or regulatory nucleotidic sequences in the survival or infection of mycobacterium species and useful for the design of antituberculosis drugs and vaccines based on the knowledge of comparative mycobactertial genomics.