A picture is worth a thousand words. As human beings, we are able to tell a story from a picture based on what we have seen and what we have been taught. A 3-year old child is capable of building models of a substantial number of concepts and recognizing them using the learned models stored in his or her brain. Hence, from a technological stance, it is appreciated that a computer program may be adapted to learn a large collection of semantic concepts from 2-D or 3-D images, build models about these concepts, and recognize these concepts based on these models.
Automatic linguistic indexing of pictures is essentially important to content-based image retrieval and computer object recognition. It can potentially be applied to many areas including biomedicine, commerce, the military, education, digital libraries, and Web searching. Decades of research have shown that designing a generic computer algorithm that can learn concepts from images and automatically translate the content of images to linguistic terms is highly difficult. Much success has been achieved in recognizing a relatively small set of objects or concepts within specific domains.
Many content-based image retrieval (CBIR) systems have been developed. Most of the CBIR projects were aimed at general-purpose image indexing and retrieval systems that focused on searching images visually similar to the query image or a query sketch. These systems were not adapted to have the capability of assigning comprehensive textual descriptions automatically to pictures, i.e., linguistic indexing, because of the great difficulty in recognizing a large number of objects. However, this function is essential for linking images to text and consequently broadening the possible usages of an image database.
Many researchers have attempted to use machine learning techniques for image indexing and retrieval. One such system included a learning component wherein the system internally generated many segmentations or groupings of each image's regions based on different combinations of features, then learned which combinations best represented the semantic categories provided as examples by the user. The system required the supervised training of various parts of the image.
A growing trend in the field of image retrieval is to automate linguistic indexing of images by statistical classification methods to group images into rough semantic classes or categories, such as textured-nontextured, graph-photograph. Potentially, the categorization enhances retrieval by permitting semantically-adaptive searching methods and by narrowing down the searching range in a database. The approach is limited because these classification methods are problem specific and do not extend straightforwardly.
Prior art methods for associating images explicitly with words have one major limitation in that the algorithm used with these methods relies on semantically meaningful segmentation, which is generally unavailable to image databases. Thus there is a need for a system and method of automatic linguistic indexing of images that overcomes the above disadvantage relative to segmentation and wherein the system provides scalability that allows for a large number of categories to be trained at once.