It is known that most of the drugs that are available today to cure infections bind to specific protein target molecules in the cell of the causative organism e.g., several antibiotics are known to disrupt the function of ribosomes so that the protein translation is affected. In these cases it has been found that the drugs either bind to the ribosomal RNA directly or RNA protein complexes (Wimberly et al, 1999). Chemical probing experiments have revealed that the drug binds to certain nucleotide sequences of ribosomal RNA that are ‘invariant’ in structurally analogous regions in different organisms (Porse and Garrett, 1999). The other class of drugs serves to block other functions such as transcription (Cutler et al, 1999) or fatty acid synthesis in the bacterial cell (McCafferty et al., 1999).
Recently, several drug resistant strains (Ghannoum & Rice, 1999) of pathogenic bacteria have emerged that renders the current treatment procedures ineffective in curing infections due to bacterial pathogens. This necessitates the identification of new drug targets and the corresponding drugs. For this purpose, the availability of complete genome sequences from various microbes offers us an opportunity to analyze all the proteins encoded in a given genome. Since most drugs known today target proteins, it is likely that analyzing all the proteins in a given bacterium may provide new valid drug targets.
The knowledge of conserved invariant sequences in a protein can be useful in understanding certain features of a protein's architecture, such as buried versus exposed location of a segment or the presence of specific secondary structural elements (Rooman and Wodak, 1988, Presnell et al., 1992). The protein's functional role is the most important aspect of conserved invariant sequences. Methods of usual sequence analysis include BLAST (Altschul et al., 1990), and FASTA (Wilbur and Lipman, 1983). These methods carryout sequence alignments whose quality is evaluated using an amino acid substitution matrix. Statistical calculations are performed and the results are output in a ranked manner, with the best similar sequence ranking first. However, these methods are not designed to do a genome-wise comparison simultaneously to identify invariant sequence motifs that are of particular importance in this work.
In order to compare each protein of one organism with all other proteins of several other organisms, either one has to use BLAST one by one or a batch BLAST has to be used which is highly time consuming and therefore not practicable. Even if this were done, at the end of the exercise one would obtain the overall similarity of a set of homologous proteins and alignments.
The problem with multiple sequence alignment is that it is biased to the selection of proteins. Only proteins that are functionally related will give a clear picture of any relationship between the selected proteins. Such procedures are labor intensive and time consuming and leads to results that need further processing and filtering. However, by these methods it is not possible to compare all proteins of several organisms and retrieve conserved invariant peptides.
The present invention provides a novel computer based method to look for invariant sequence motif that will lead to manifold usage as mentioned above and obviates the drawbacks listed above.
The applicants' approach is based on the paradigm that the invariant sequence motifs between the different bacterial proteins must be responsible for an important role for the structure and the function of the protein. Of the numerous ways by which drug targets can be identified, we have taken an approach based on comparative & structural genomics. In this case, the invariant sequence motifs may be either directly or indirectly involved in the function of the subject protein molecule. This approach is derived from the concept that invariant sequence motifs that have remained unchanged across bacteria that are related either distantly or closely should have evolved a unique structural feature that can not be compromised. Indeed, it is even possible that the so-called conservative substitutions are also not tolerated in these invariant sequence motifs. To this end, we have identified several invariant peptide motifs by direct sequence comparison between various bacterial genomes without any a priori assumptions. This purely unbiased and unassumed way of studying the sequences has the benefit of revealing unidentified sequence properties in the various genomes.
Since the invariant sequence motifs may be important for the function of the subject protein molecule, we aim to develop these peptide motifs as potential broad-spectrum antibacterial drug targets. It is probable that a small molecule that can bind specifically to these invariant sequences may cause disruption of function of the subject protein molecule. It is envisaged that this in silico approach will provide new leads for experimental validation to derive functions from protein sequences existing in the available databases.