The flexible retrieval from, manipulation of, and navigation through image databases has become an important problem in the database management arts, as it has applications in video editing, photo-journalism, art, fashion, cataloguing, retailing, interactive computer aided design (CAD), geographic data processing and so forth.
An early content-based retrieval (CBR) system is one known as ART MUSEUM. Reference in this regard can be made to K. Hirata and T. Katzo, “Query by visual example, content based image retrieval”, in Advances in Database Technology—EDBT'92, A. Pirotte, C. Delobel, and G. Gottlob, Eds., Lecture Notes in Computer Science, vol. 580, 1992. In this particular CBR the retrieval of image data is based entirely on edge features. An early commercial content-based image search engine that had profound effects on later systems was one known as QBIC. Reference in this regard can be had to W. Niblack, R. Berber, W. Equitz, M. Flickner, E. Glasman, D. Petkovic, and P. Yanker, “The QBIC project: Querying images by content using color, texture and shape”, in Proc. SPIE Storage and Retrieval for Image and Video Data Bases, pp. 172–187, 1994. For color representation this system uses a k-element histogram and average of (R,G,B), (Y,i,q), and (L,a,b) coordinates, whereas for the description of texture it implements the feature set of Tamura (see H. Tamura, S. Mori, and T. Yamawaki, “Textural features corresponding to visual perception”, IEEE Transactions Systems, Man and Cybernetics, vol. 8, pp. 460–473, 1982.) In a similar fashion, color, texture and shape are supported as a set of interactive tools for browsing and searching images in the Photobook system developed at the MIT Media Lab, as described by A. Pentland, R. W. Picard, and S. Sclaroff, “Photobook: Content-based manipulation of image databases”, International Journal of Computer Vision, vol. 18, no. 3, pp. 233–254, 1996. In addition to these elementary features, systems such as VisualSeek (see J. R. Smith, and S. Chang, “VisualSeek: A fully automated content-based query system”, in Proc. ACM Multimedia 96, pp. 87–98, 1996), Netra (see W. Y. Ma, and B. S. Manjunath, “Netra: A toolbox for navigating large image databases” in Proc. IEEE Int. Conf. on Image Processing, vol. I, pp. 568–571, 1997) and Virage (see A. Gupta, and R. Jain, “Visual information retrieval”, Communications of the ACM, vol. 40, no. 5, pp. 70–79, 1997) support queries based on spatial relationships and color layout. Moreover, in the Virage system, users can select a combination of implemented features by adjusting weights according to their own “perception”. This paradigm is also supported in the RetrievalWare search engine (see J. Dowe “Content based retrieval in multimedia imaging”, in Proc. SPIE Storage and Retrieval for Image and Video Databases, 1993.) A different approach to similarity modeling is proposed in the MARS system, as described by Y. Rui, T. S. Huang, and S. Mehrotra, “Content-based image retrieval with relevance feed-back in Mars”, in Proc. IEEE Conf. on Image Processing, vol. II, pp. 815–818, 1997. In the MARS system the main focus is not on finding a best representation, but rather on the use of relevance feedback to dynamically adapt multiple visual features to different applications and different users.
High-level semantic concepts play a large role in the way that humans perceive images and measure their similarity. Unfortunately, these concepts are not directly related to image attributes. Although many sophisticated algorithms have been devised to describe color, shape and texture features, as was made apparent above, these algorithms do not adequately model image semantics and thus are inherently limited when dealing with broad-content image databases. Yet, due to their computational efficiency, the low-level visual attributes are widely used by content-based retrieval and image navigation systems, leaving the user with the task of bridging the gap between the low-level nature of these primitives and the high-level semantics used to judge image similarity.
Apart from a few exceptions, most conventional image and video retrieval systems neglect the semantic content, and support the paradigm of query by example using similarity in low-level features, such as color, layout, texture, shape, etc. Traditional text-based query, describing the semantic content of an image, has motivated recent research in human perception, semantic image retrieval and video indexing.
In image retrieval the problem of semantic modeling was primarily identified as a scene recognition/object detection task. One system of this type is known as IRIS, see T. Hermes, et al., “Image retrieval for information systems”, in Storage and Retrieval for Image and Video Databases III, Proc SPIE 2420, 394–405, 1995, which uses color, texture, regional and spatial information to derive the most likely interpretation of a scene and to generate text descriptors, which can be input to any text retrieval system. Another approach in capturing the semantic meaning of the query image is represented by techniques that allow a system to learn associations between semantic concepts and primitive features from user feedback. An early example of this type of system was “FourEyes”, as described by T. Minka, “An image database browser that learns from user interaction”, MIT Media Laboratory Technical Report #365, 1996. This system asks the user to annotate selected regions of an image, and then proceeds to apply the same semantic labels to areas with similar characteristics. This approach was also taken by Chang et al., who introduced the concept of a semantic visual template (S. F. Chang, W. Chen, and H. Sundaram, “Semantic visual templates: linking visual features to semantics”, in Proc. IEEE International Conference on Image Processing, Chicago, Ill., pp. 531–535, 1995.) In the approach of Chang et al. the user is asked to identify a possible range of color, texture, shape or motion parameters to express the user's query, and the query is then refined using the relevance feedback technique. When the user is satisfied, the query is given a semantic label and stored in a database for later use. Over time, this query database becomes a “visual thesaurus” linking each semantic concept to the range of primitive image features most likely to retrieve relevant items. In video indexing and retrieval, recent attempts to introduce semantic concepts include those described by M. Naphade, and T. Huang, “Probabilistic framework for semantic video indexing, filtering and retrieval”, IEEE Transactions on Multimedia, vol. 3, no. 1, pp. 141–151, March 2001, and by A. M. Ferman, and M. Tekalp, “Probabilistic analysis and extraction of video content”, in Proc. IEEE Int. Conf. Image Processing, Kobe, Japan, October 1999.
The goal of these systems is to overcome the limitations of traditional image descriptors in capturing the semantics of images. By introducing some form of relevance feedback, these systems provide the user with a tool for dynamically constructing semantic filters. However, the ability of these matched filters to capture the semantic content depends entirely on the quality of the images, the willingness of the user to cooperate, and the degree to which the process converges to a satisfactory semantic descriptor.
Content-based retrieval (CBR) methods in medical databases have been designed to support specific tasks, such as retrieval of digital mammograms or 3D MRI images. However, these methods cannot be transferred to other medical applications since different imaging modalities require different types of processing. To enable content-based queries in diverse collections of medical images, the retrieval system must be familiar with the current image class prior to the query processing.
More specifically, medical information systems with advanced browsing capabilities play an increasingly important role in medical training, research, and diagnostics. Thus far, however, the utilization of online medical data has been limited by a lack of effective search methods, and text-based searches have been the dominant approach for medical database management. Since images represent an essential component of the diagnosis, follow-up and research, it is very desirable to use medical images to support browsing and querying of medical databases. Existing CBIR systems depend on visual attributes, such as color, texture and shape, to classify and search for similar images. While this approach may provide satisfactory results when constrained to a single application domain, the use of color, texture and shape features alone do not adequately model image semantics and thus have many limitations when applied to broad content image databases. This problem becomes even more apparent when dealing with semantics of medical images. For this reason, CBIR methods in medical applications have been designed to support specific medical tasks, such as retrieval of tumor shapes in mammograms (see P. Korn, N. Sidiropoulos, C. Faloutsos, E. Siegel, and Z. Protopapas, “Fast and effective retrieval of medical tumor shapes”, IEEE Trans. on Knowledge and Data Engineering, vol. 10, no. 6, pp. 889–904, 1998), computed tomographies of the lung (see C. R. Shyu, C. E. Brodley, A. C. Kak, A. Kosaka, A. M. Aisen, and L. S. Broderick, “ASSERT: A physician-in-the-loop content based retrieval system for HRCT image databases”, Comp. Vision and Image Underst., 75(1/2), pp. 111–132, 1999), 3D MRI images in neurology (see J. Declerck, G. Subsol, J-P. Thirion, and N. Ayache, “Automatic retrieval of anatomical structures in 3D medical images”, Tech. Report 2485, INRIA, Sophia-Antipolis, France, 1995; A. Guimond, and G. Subsol, “Automatic MRI database exploration and applications”, Pattern Recognition and Artificial Intelligence, vol. 11, no. 8, December 1997; Y. Liu, F. Dellaert, and W. E. Rothfus, “Classification Driven Semantic Based Medical Image Indexing and Retrieval”, Tech. Report CMU-RI-TR-98-25, Robotics Institute, Carnegie Mellon University, 1998), or pathology (see D. Comaniciu, D. Foran, and P. Meer, “Shape-based image indexing and retrieval for diagnostic pathology”, Proc. 14th Int. Conference on Pattern Recognition, Brisbane, August 1998.) However, these methods are task-specific and cannot be transferred to other medical applications since different imaging modalities require different processing methods. Therefore, to enable content-based queries for research and diagnostic purposes, the information retrieval system must be familiar with the current image class prior to the query processing. Hence, for this to occur the categorization of medical images into different imaging modalities is required to support further queries. This need has not been adequately addressed prior to this invention.
As maybe appreciated, these shortcomings are not limited only to medical image databases and, therefore, there is a long-felt and unfulfilled need to provide an improved technique that automatically characterizes images according to their modalities, and that also employs semantic information for browsing, searching, querying and visualizing collections of digital images.