1. Field of the Invention
The present invention relates to an apparatus and method for comparing a protein structure, which can search a similar protein in a protein database such as a protein data bank (PDB) database; and, more particularly, to an apparatus and method for comparing a protein structure, which can search a protein similar to an inquiry protein in a 3D protein database in real time by describing a feature of a protein structure by using a three-dimensional (3D) relative directional angle (RDA) and a Fourier descriptor.
This work was supported by the Information Technology (IT) research and development program of the Korean Ministry of Information and Communication (MIC) and/or the Korean Institute for Information Technology Advancement (IITA) [2005-S-008-02, “SW Component Development of Bio Data Mining & Integrated Management”].
2. Description of Related Art
Generally, a protein database stores more than 25,000 kinds of proteins. About 100 protein data are added in one week. Thus, additional cost for classifying and searching proteins is always incurred.
There are two methods for comparing similar proteins.
The first method is a pairwise 3D structure comparison method. The pairwise 3D structure comparison method is used to express a similarity of two 3D protein structures in a quantitative value. The pairwise 3D structure comparison is called a 3D structure matching or alignment. Generally, a 3D alignment problem is known as NP-complete. The 3D structure problem has been solved using heuristic methods. The methods measure use different calculation methods to measure a score of the similarity.
The second method is to search a protein similar to an inquiry protein in a database. Generally, the second method shows M results (where M is a natural number) having a score equal to or greater than a threshold value. In this case, since the search is performed on all data of the protein database, N-time comparing processes are required for searching the protein in N databases (where N is a natural number. As the size of the protein database increases, the efficiency of data search is gradually degraded. Therefore, a fast database search becomes an important factor.
The conventional pairwise 3D structure comparison includes a sequential structure alignment program (SSAP), a distance alignment tool (DALI), a vector alignment search tool (VAST), and a combinatorial extension (CE). Most of these conventional methods are performed through two alignment steps. The first alignment step is to fine a similarity of a secondary structure element or a Cα backbone fragment, and the second alignment step is to align Cα atoms. Although these conventional methods provide very good results in view of similarity, they have a very slow response time when searching the protein database.
Although Topscan and SCALE are the pairwise 3D structure comparison method, they use only the secondary structure element. Since the two methods do not perform the second alignment step, their search speed is faster than SSAP, DALI, VAST, and CE. However, the two methods have disadvantages in that their search results are very incorrect and their response time is too slow to use them as the search method in a large-scale protein database.
In addition, the response time of Guerra, protein structure indexing (PSI), and ProtDex is also too slow to use them as the search method in a large-scale protein database.