Following the progress of genome analysis and cDNA analysis of various organisms including pathogenic microorganisms, the number of novel genes whose functions are unknown is rapidly increasing, together with the number of proteins encoded by the genes. So far, the analysis of the nucleotide sequence of the whole genome of a microorganism, for example Mycoplasma genitalium (Fraser et al., Science 270, 397-403, 1995), Haemophilus influenzae (Fleischman et al., Science 269, 496-512, 1995), and Methanococcus jannaschii (Bult et al., Science 273, 1058-1073, 1996), has been completed, so that numerous novel proteins predicted from the genome sequence have been discovered. For humans and mice, the cDNA analysis is under way in combination with the genome analysis, which brings about the discovery of a great number of novel proteins.
In such circumstance, the prediction of the function of a functionally unknown protein or a functional site thereof has been a significant issue. If not only a novel protein but also a novel function or a novel functional site of a protein with a known function is discovered, whether or not these proteins are worth industrial or clinical application is possibly determined. Furthermore, such prediction of function possibly enables to prepare a modified protein with a further improved function.
Whether or not a protein encoded by a gene elucidated by genome analysis or cDNA analysis is novel or has a known function has been determined conventionally by searching the homology through protein databases such as Swiss-Prot. So as to predict a functional site, additionally, functionally identical proteins derived from various organisms are extracted from a protein database and are then subjected to alignment, to identify a region conserved in common to them and predict the conserved region as a functional site.
However, disadvantageously, such alignment method cannot be used if a protein obtained by genome analysis or cDNA analysis is an absolutely novel protein. Even if the protein has homology with known proteins in a protein database, the conserved region occupies most of the amino acid sequence of the protein in case that the protein is homologous to proteins derived from closely related organisms, so that it is impossible to predict the functional site. As to modification of protein, generally, the function of a protein is potentially deteriorated irrespective of the fact that the function is known or unknown once the conserved region is modified, even if the functional site is predicted by alignment. Accordingly, the amino-acid residues outside the conserved region should be modified to improve the function. In other words, it is required to find a novel functional site in such protein to be modified. Using the conventional alignment method, disadvantageously, a novel functional site cannot be discovered or which amino-acid residue should be modified cannot be predicted.
Taking account of such circumstance, the present invention has been carried out. It is an object of the present invention to provide a novel method for predicting a functional site of a functionally unknown protein obtained by genome analysis or cDNA analysis.
In accordance with the present invention, furthermore, it is an object to provide a system for predicting the function.
In accordance with the present invention, still furthermore, it is an object to provide a method for predicting a novel functional site of a protein with an unknown function or with a known function and subjecting the functional site to mutation to prepare a modified protein.
Still furthermore, it is an object of the present invention to provide a protein with a function modified by the method described above.