1. Field of the Invention
The present invention relates to an indexing structure for retrieving a large capacity image database. Particularly, the present invention relates to an insertion method in a high-dimensional index structure for a content-based image retrieval, in which a desired image can be efficiently searched when there is formed a high-dimensional image database.
2. Description of the Prior Art
In handling a large capacity image database, the main problem is the search efficiency, and to solve this problem, a proper indexing technique has to be developed. The high-dimensional index structure is the one which is capable of efficiently accessing the data based on various feature values. This structure can be usefully applied to a content-based image retrieval.
There have been carried out many studies on the high-dimensional index structure based on various features. However, in the conventional high-dimensional index structure such as the R*-tree, as the number of dimensions (features) is increased, so the time and spatial requirements are exponentially increased. Therefore, in the case of a high-dimensional (more than 6 image databases), it is no better than the sequential retrievals, with the result that the indexing structure loses its effect. In an attempt to solve this problem, two proposals were made for the indexing structure of the content-based image retrieval system.
The first is the typical value representing (grouping) method. In this method, the number of dimensions of the image data is reduced, so that it can be used as an indexing structure for the content-based image retrieval. This is attained by grouping the image features, or by representing the typical values of them. In this method, the image data are expressed by converting the high-dimensional feature space to a low dimensional feature space. Therefore, when a large capacity image is searched, the search range becomes large, with the result that the search performance is markedly aggravated.
The second is a high-dimensional index structure capable of accommodating a large amount of image features. This indexing structure includes: a telescopic vector (TV) tree, a similarity search (SS) tree, an extended (X) tree, and a content-based image retrieval (CIR) tree. The TV tree variably uses the number of indexing dimensions to solve the conventional problem that the time or spatial requirement is exponentially increased in accordance with the increase of the number of the features. In this method, however, overlaps occur very much during the split, with the result that the search performance is significantly aggravated.
The SS tree is a dynamic structure which is designed to efficiently handle the similarity queries. In this method, overlaps also frequently occur during the split, and as the number of features increases, so much the time requirement increases.
The X tree is a high-dimensional index structure which is proposed to prevent the overlapping region which is the cause of the degradation of the search performance. However, in this method also, the search performance is significantly aggravated during the data processing.
The CIR tree variably uses the feature data, and uses super node having a size equal to an integer-multiple of that of the normal node and storable continuously on a disc. Therefore, it can efficiently accommodate the high-dimension feature. The performance evaluation shows that the CIR tree is superior over the TV tree and the X tree.
As described above, in the conventional high-dimensional index structures, as the number of dimensions are increased, so the time and spatial requirements are exponentially increased. As a result, they cannot accommodate the high-dimensional image data.
The present invention is intended to overcome the above described disadvantages of the conventional techniques.
Therefore it is an object of the present invention to provide an insertion method in high-dimensional index structure for the content-based image retrieval, in which the conventional problem, i.e., the exponential increase of the time and spatial requirements in accordance with the increase of the dimensions is solved, thereby making it possible to efficiently accommodate the high-dimensional features.
In achieving the above object, the insertion method in a high-dimensional index structure for a content-based image retrieval according to the present invention includes the steps of: (a) inserting an object into a root node if a tree consists of only a root node; (b) forming a new root node when the existing root node overflows; (c) inserting an object if lower nodes of the root node are branch nodes, and choosing the branch nodes in a sequence of one having less overlaps with nearby nodes, one having same values in many dimensions, one showing less increase in a minimum bounding region size, and one disposed more adjacently to the center; (d) choosing reinsertion objects based on the weight from the branch node if the branch node overflows; (e) splitting the branch node if the branch node overflows, and otherwise, adjusting the minimum bounding region; (f) choosing a lower node as a terminal node to insert an object if a lower node of the root node is a branch node; (g) choosing reinsertion objects based on the weight center from the terminal node if the terminal node with an object inserted into it overflows; and (h) dividing the terminal node, and then, carrying out the steps (c) to (g) to insert a new object into the branch node if the terminal node overflows after the re-insertion.
In another aspect of the present invention, the recording medium according to the present invention includes: a means for judging on constituent elements of a tree; a means for inserting an object into a root node if the tree consists of a root node, and forming a new root node if the existing root node overflows; a means for inserting an object if lower nodes of the root node are branch nodes, and choosing the branch nodes in a sequence of one having less overlaps with nearby nodes, one having same values in many dimensions, one showing less increase in a minimum bounding region size, and one disposed more adjacently to a center; a means for adding a mean value of a minimum bounding region if the branch node overflows, obtaining a weight center by dividing by a number of entries, and choosing objects in a sequence of remote positioning from the weight center to insert them into the branch nodes; a means for splitting the branch node if the branch node overflows with the objects, and otherwise, adjusting the minimum bounding region; a means for choosing a terminal node if a lower node of the root node is a terminal node or if a lower node of the branch node is a terminal node, then inserting an object into the terminal node, then adding an object value of a same dimension, then obtaining a weight center by dividing by a number of entries, and then choosing another object to insert it into the terminal node; and a means for dividing the terminal node if the terminal node overflows, and inserting a new object into the branch node, whereby a program for activating the above means is read by a computer.
In the present invention, the basic properties of the CIR tree are utilized, and at the same time, a split algorithm having a superior search efficiency over the conventional CIR tree is employed. Further, an effective standard for choosing lower nodes is provided, and a re-insertion algorithm capable of re-inserting based on a weight center is employed, thereby forming an ECIR (Extended CIR) as a high-dimensional index structure having an efficient tree structure.