Advanced database system research faces a great challenge necessitated by the emergence of massive, complex structural data (e.g., sequences, lattices, trees, graphs and networks) which are encountered in applications such as bio-informatics, geo-informatics and chem-informatics. A particular challenge involves efficiently and accurately searching databases of such structural data.
Graphs are the most general form of structural data, and thus are used extensively in chem-informatics and bio-informatics datasets. For example, graph-structured databases have been used heavily in the development of chemical structure search and registration systems. Graph-structured databases are also being used in computer vision and pattern recognition, wherein graphs are used to represent complex structures, such as hand-drawn symbols, three-dimensional objects and medical images.
A number of methods have been developed to handle data queries involving complex structural data. See, for example, S. Beretti et al., Efficient Matching and Indexing of Graph Models in Content Based Retrieval, 23 IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, pgs. 1089-1105 (2001); D. Shasha et al., Algorithmics and Applications of Tree and Graph Searching, IN PROC. 21st ACM SYMP. ON PRINCIPLES OF DATABASE SYSTEMS (PODS'02), pgs. 39-52 (2002) (hereinafter “Shasha”); X. Yan et al., Graph Indexing: A Frequent Structure-Based Approach, IN PROC. 2004 ACM INT. CONF. ON MANAGEMENT OF DATA (SIGMOD'04), pgs. 335-346 (2004) (hereinafter “Yan”); J. Raymond et al., Rascal: Calculation of Graph Similarity Using Maximum Common Edge Subgraphs, 45 THE COMPUTER JOURNAL, pgs. 631-644 (2002) (hereinafter “Raymond”); and N. Nilsson, Principles of Artificial Intelligence, Morgan Kaufmann, Palo Alto, Calif., (1980) (hereinafter “Nilsson”), the disclosures of which are incorporated by reference herein.
While these methods are very useful, they do have important limitations. Specifically, none of the cited methods accommodate searching databases of structural data against a query structure to find structures in the database that are similar, but not exactly the same as, a portion or portions of the query. As a result, highly relevant structures in the database inevitably will be overlooked.
Therefore, improved techniques for similarity searching databases of structural data would be desirable.