Along with the development of information technologies, large databases have been increasingly found in the field of multimedia applications, e.g., image file management, industry image/video supervision, medical image diagnoses, etc. Rapid image retrieval and recognition in connection with a large database has also consequently become an important issue of study.
A traditional image description method based upon a “bag of words” works well to address this issue and has been widely applied. For example, such an image description method based upon a bag of words has been introduced, for example, in “Recognition with local features: the kernel recipe” by C. Wallraven, et al in Proc. ICCV, Vol. 1 pp 257-264, 2003, where representative “visual words” are extracted from a large number of local features of an image and the frequencies that these visual word appear in the image are used to describe the image. Also a description method based upon a multilayer image spatial structure has been proposed in order to address a limited description of the feature with respect to spatial information. A method for describing an image in the form of a spatial pyramid has been disclosed in “Pyramid match kernels: Discriminative classification with sets of image features” by K, Grauman, et al in Proc. ICCV, 2005.