1. Field of the Invention
This invention relates to storage, indexing, and retrieval of image data, and more particularly to a method and system for generating and retrieving context vectors that represent high-dimensional abstractions of information in images.
2. Description of Background Art
Analysis of image subject content is a time-consuming and costly operation. This analysis is often required for the identification of images of interest in existing image data bases and the routing and dissemination of images of interest in a real-time environment. The conventional approach is to rely upon human intellectual effort to analyze the content of images. It would be desirable to reliably translate image data into representations that would enable a computer to assess the relative proximity of meaning among images in a database.
Certain known document retrieval systems use variable length lists of terms as a representation, but without meaning sensitivity between terms. In such systems, pairs of terms are either synonyms or not synonyms.
So-called "vector space methods" can capture meaning sensitivity, but they require that the closeness of every pair of terms be known. A typical full-scale system with over 100,000 terms might require about 5 billion relationships--an impractical amount of information to obtain and store.
Methods have also been proposed for searching documents with fixed length vectors. However, such methods require work on the order of at least the square of the sum of the number of documents and the number of terms. This is impractical for a large corpus of documents, images, or terms.
A document retrieval model based on neural networks that captures some meaning sensitivity has been proposed. A neural network consists of a collection of cells and connections among cells, where every connection has an associated positive or negative number, called a weight or component value. Each cell employs a common rule to compute an output, which is then passed along connections to other cells. The particular connections and component values determine the behavior of the network when some specified "input" cells receive a set of values. A search in a document retrieval system employing a neural network requires multiplication for twice the product of the number of documents and the number of keywords for each of a plurality of cycles.
Other document retrieval methods use vector representations in a Euclidean space. The kernel or core used in this method comprises non-overlapping documents. This results in small dimensional vectors on the order of seven values. Vectors are generated from the core documents based upon whether or not a term appears in a document. As an alternative, the method starts with a kernel of terms which never co-occur.
It would be desirable to have a computing system that can derive accurate, efficient, and manageable representations of images for later recall, retrieval, and association.