Proteins are biopolymers comprising 20 kinds of amino acids as building blocks and have structures in which about 50 to 1,000 amino acids are connected in a chain by peptide bonds (—CONH—). The existence of various kinds of proteins has been revealed such as enzymes which catalyze substance conversion in organism, receptors related to their inter- or intracellular signal transduction, receptors related to the control of gene expression, cytokines which are secreted at the time of inflammation, proteins related to the transport of substances and others. In the organisms of higher animals such as human, there are 50 to 100 thousands of kinds of proteins, and each plays specific functions and roles.
Enzymes provide fields for chemical reactions in which specific products are obtained by the actions on specific substrates, and proceed stereospecific or regiospecific reactions with moderate conditions. Receptors transduce signals through the structural change upon the binding of hormones and signal transmitters. The features common to these enzymes and receptors are the appearance of their biological functions by forming stable complexes with specific molecules (ligands). Protein molecules, which are long like strings, are folded to take certain steric structures and form structural sites (ligand binding sites) which bind specifically with artificial molecules such as drugs and specific biomolecules. This ligand binding site is essential for the appearances of the functions of enzymes and receptors.
The steric structures of proteins can be determined by X-ray crystallographic analysis and NMR analysis. Due to the remarkable progress and spread of these analytical techniques, determination of steric structures of proteins has become easy, and the number of proteins analyzed is increasing acceleratingly. Protein Data Bank, which is a database of protein structures, stores three-dimensional coordinates of more than 7,000 proteins at present, and the data are available throughout the world. Accordingly, once functions of a protein are known, it has become possible to understand the relations between the structure and the function of the protein on atomic levels by analyzing the crystal structure of the complexes with appropriate ligands. Moreover, by using the steric structures of proteins which have been analyzed crystallographically as templates, and by substituting the side chains of amino acids, it has become possible to predict the steric structure of a protein having highly homologous amino acid sequences (homology modeling).
Protein studies have so far been conducted by the means in which after the separation and purification of proteins employing its biological function as a guide, its amino acid sequence is determined to analyze the structure and function. However, recently, as analyses of genes have become easy, there are cases in which the existence of a protein is suggested from genetic information. For example, the existence of considerable number of proteins has been revealed by a large-scale project aiming at the human genome analysis, and these results are expected to be utilized for the elucidation of the cause of diseases and drug design.
However, for those proteins successively found from genome analysis studies, their amino acid sequences are merely elucidated, while in most cases their biological functions cannot be predicted at all. For this reason, an enormous amount of study is necessary to predict or confirm functions for each protein, which becomes an obstacle for the effective use of genome information. Moreover, although the steric structure of proteins whose amino acid sequences have been elucidated can be determined more easily than before due to the progress of crystallographic analysis and NMR analysis, there are many cases in which the functions are hardly known even though the steric structures of proteins have been elucidated.
At present, methods of predicting the functions of novel proteins easily have not been established. For example, a prediction method is adopted in which a novel protein is predicted to have functions similar to a known protein, if a protein with high homology is found by comparing the amino acid sequence of the novel protein with groups of amino acid sequences of proteins with known functions. Furthermore, for the multiple proteins with the same functions, information concerning the correlation between the structure and function can be obtained by making alignment so that homologous parts become as large as possible. However, even for proteins with the same function, the homology is not so high in general when the biological species are different. Thus, the above-mentioned methods which depend on alignment are not helpful at all for many proteins whose functions are known to be the same or not.