The proliferation of digital images and video has been a direct result of more efficient methods for gathering digital imagery. From recording devices such as digital cameras and digital video recorders, to digitizers for CCTV and broadcast media, to 2-D home and office scanners for producing digitized versions of photographs, the myriad digital recording methods has created an explosion of digital media.
For purposes of the present invention, the term “image” will signify a two-dimensional digital representation of a scene, typically created and digitized by an image capture device, signifying a standalone event or a momentary representation of a sequence of events. The term “object” will refer to a uniquely identifiable physical thing or item at least partially depicted in an image. The term “content” will refer to the subject matter of an image.
Along with the countless forms of digital imagery, many different methods have appeared that attempt to organize and structure the digital repositories in such a way that permit retrieval of images in a structured or semi-structured fashion. Unfortunately, providing some structure to the imagery requires conformance to a certain format, necessitates an input process that structures the information, or requires human interaction to annotate the imagery for later retrieval. The most popular examples of image search engines, such as Alta Vista and Google, for example, rely exclusively on annotations and/or file names associated with the images as the only basis for determining whether that image contains desired visual information.
Some image search systems have attempted to overcome the “input requirements” of an annotated image repository by automating the extraction of image features for objects depicted within the data set of the image that results in significantly-reduced representations of the contents of the image repository, thereby decreasing the search time for desired images. U.S. Pat. No. 6,438,130, for example, describes a multi-level filtering technique for reducing the data for content-based image searches. U.S. Pat. No. 6,271,840 discusses a browser interface for displaying reduced versions of images created by an integrated page renderer.
Several techniques exist for the indexing, categorization, and description of images that will be retrieved via a multitude of searching algorithms. Typical image indexing techniques utilize basic attributes like color, shape and texture to describe images or regions of interest within those images.
U.S. Pat. No. 6,778,697 discloses a color image processing method for retrieving a color feature descriptor for describing color features of an image. The color image processing method includes obtaining color vectors of an input image, classifying the color vectors to obtain dominant colors of the input image and their ratios, and representing the dominant colors and their ratios as a color feature descriptor of the input image to allow fast search and retrieval. U.S. Pat. No. 6,411,953 provides a perceptually-based system for pattern retrieval and matching which uses a predetermined vocabulary comprising one or more dimensions to extract color and texture information from an image selected by a user. The system then generates a distance measure characterizing the relationship of the selected image to another image stored in a database, by applying a grammar, comprising a set of predetermined rules, to the color and texture information extracted from the selected image and corresponding color and texture information associated with the stored image.
U.S. Pat. No. 5,802,361 discloses an analysis and retrieval system that uses image attributes to retrieve selected images from a database. The patent provides a fundamental description of the process required to search large image archives for desired images.
U.S. Pat. No. 5,893,095 and U.S. Pat. No. 5,911,139 describe a system for content-based search and retrieval that utilizes primitives to operate on visual objects. The fundamental approach taught in this patent is the comparison of visual objects and the ability to identify similarities in visual perception of the images. Another approach that utilizes similarity of images is described in U.S. Pat. No. 6,463,432 and U.S. Pat. No. 6,226,636 ('636). The '636 patent discloses a system that builds a database which stores data corresponding to a plurality of images by dividing each image into several regions. Then, the system calculates a histogram of each region. The database may then be used to determine images which are similar to a query image. U.S. Pat. No. 6,502,105 teaches a system that builds a database of image-related data by inputting a plurality of images, and for each image: dividing the image into a plurality of regions and generating a graph based on the regions, and storing data for the graph in the database. The database may then be used to determine whether a query image is similar to one or more of the plurality of images by dividing the query image into regions and generating a graph based on the regions, and comparing the generated graph to other graphs in the database that correspond to the plurality of images. U.S. Pat. No. 6,240,424 teaches a method and apparatus for classifying and querying a database of images, in which the images in the database are classified using primary objects as a clustering center.
U.S. Pat. No. 5,983,237 discloses an image analysis system consisting of a visual dictionary, a visual query processing method, an image database, a visual query processing system, and the creation of feature vectors. This system discusses the processing of imagery as an integral part of storing the image in the database, whereby both annotated text and feature vectors can be associated with the image. The subsequent searching of images can occur via text searches based on the annotation and on visual searches based on the feature vectors of that image.
U.S. Pat. No. 6,084,595 discloses a method of utilizing indexed retrieval to improve computational efficiency for searching large databases of objects such as images. The premise of the approach is to negate the requirement of a query engine to search all vectors for an image within a feature vector database.
U.S. Pat. No. 6,389,417 discloses a more advanced approach to image retrieval based on determining characteristics of images within regions, not for an entire image. The system is limited, however, to using query regions based on an image supplied by the user, thus limiting the invention to finding images that contain regions similar to a target region within a provided image.
U.S. Pat. No. 6,566,710 reveals a graphical search technique based on joint histograms. The invention discloses usable methods for achieving matches and/or near-matches of images based on similar joint histograms.
U.S. Pat. No. 6,574,378 describes a search and retrieval system based on “visual keywords derived using a learning technique from a plurality of visual tokens extracted across a predetermined number of visual elements.” As with the other systems and methods described above, this patent utilizes the analysis of the image and its contents, with no mention of understanding the various elements or objects that comprise the components within the scene.
U.S. Pat. No. 6,574,616 attempts to improve on image understanding of the preferences of a given user by using probability functions. The system suffers from the requirement that images must be iteratively selected by a user, with more images presented based on the iterative selections and the probability functions of the images.
U.S. Pat. No. 6,584,221 describes advances in image search and retrieval by separating images into regions of interest and making searches based on color and texture. While image color and texture of a region of interest are used in this patent, no attempt is made to determine the physical attributes of the objects depicted in the images.
U.S. Pat. No. 6,611,628 reveals an image feature-coding scheme that assigns a color and a relative area to regions within an image frame. This approach utilizes image and frame color for the feature coding, with no attempts made to interpret the actual color of the objects depicted in the images.
U.S. Pat. Nos. 5,579,471, 5,647,058, and U.S. Pat. No. 6,389,424 describe various techniques for creating high dimensional index structures that are used for content-based image retrieval. U.S. Pat. No. 5,647,058 discloses a high dimensional indexing method which takes a set of objects that can be viewed as N-dimensional data vectors and builds an indexed set of truncated transformed vectors which treats the objects like k-dimensional points. The set of truncated transformed vectors can be used to conduct a preliminary similarity search using the previously created index to retrieve the qualifying records.
U.S. Pat. No. 6,563,959 describes a perceptual similarity image retrieval system in which images are subdivided into spots with similar characteristics and each image is represented by a group of spot descriptors.
Techniques have been described for creating a so-called blobworld or regional representation of an image as part of a region-based image querying. See, Carson et al., “Region Base Image Querying,” Proc. of IEEE CVPR Workshop on Content—Based Access of Images and Video Libraries, 1997, Lui et al., “Scalable Object-Based Image Retrieval,”.pdf paper, Ozer et al., “A Graph Based Object Description for Information Retrieval in Digital Image and Video Libraries,” .pdf paper, Fan et al., “Automatic Model-Based Semantic Object Extraction Algorithm,” IEEE Trans. on Circuits and Systems for Video Technology, Vol. 11, No. 10, October 2001, pp. 1073, and Ardizzoni, et al., “Windsurf: Region-based image retrieval using wavelets,” Proc. of the 1st Int'l Workshop on Similarity Search, September 1999, pp. 167-173. These references describe different techniques for the creation, segmentation or hierarchical arrangement of representations of an image using a region-based, blobworld representation of that image.
Large image repositories will undoubtedly contain imagery from a multitude of input sources. Any image search and retrieval mechanism must be specifically designed to deal with the wide variation of attribute types that will result from the myriad input sources. For a repository as complex and diverse as the Internet, a flexible image search and retrieval engine should be able to interpret information from the original image independent of the type of input device or digitizer that was used to create the image.
Computer vision and image processing techniques alone are not adequate for robust image search and retrieval. In addition, implementing image retrieval on a large scale requires automated techniques for analyzing, tagging and managing the visual assets. What is needed for large-scale image search and retrieval that is an automated system that can operate on unstructured data. This automated system must utilize techniques beyond traditional image processing in order to increase the hit rate of image retrieval that demanding users will require. Furthermore, the system must be capable of dealing with a constantly changing target image set wherein the system has little or no control over the content placed within the image search space.