The present invention relates to an improved document classification system, and in particular to a document classification system that incorporates eye gaze information.
In traditional information management systems a document was considered a homogeneous set of data to be stored and retrieved as a single unit. Nevertheless, as the need arose to use the same information in different environments and in different cognitive contexts, the concept of the document has evolved. For example, typical medical documents are composed of anagraphic data, anamnesis (past medical history), reports, and images. Each of the different portions of such medical documents may need to be queried differently. For example, a general physician might consider the whole document as a specific patient description, and therefore ask for comments linked to a given person's name. On the other hand, a specialist might focus on classes of diagnosis from radiologic exams and might want to formulate a related query for images with analogous pathological contents. Accordingly, many document retrieval and identification systems need to be capable of searching documents that include text, images, and structured data.
The primary problem in automated document management is properly indexing all of the documents. Indexing involves assigning to each document, or portion of a document, a synthetic descriptor facilitating its retrieval. The assignment of such a descriptor is generally performed by the steps of: (1) extracting relevant entities or characteristics as index keys; (2) choosing a representation for the keys; and (3) assigning a specific meaning to the keys. A detailed description of such indexing is described in Marsicoi, et al., Indexing pictorial documents by their content: a survey of current techniques: Image and Vision Computing, 15 (1997), pp. 119–141, incorporated by reference herein.
Images deserve special attention within a document management system because of the difficulty of addressing the content of an image using traditional textual query languages and indices. Images are no longer considered as pure communication objects or appendices of a textual document, but rather images are now considered self-describing entities that contain related information (content) that can be extracted directly from the image. For this reason, prior to storing an image in a database, a description activity is performed to process the image, analyze its contents, interpret its contents, and classify the results. Accordingly, the need arises to develop systems to allow content-based image extraction and retrieval.
Textual entities are readily extracted from documents by automated systems and stored in a database for later use. In contrast, it is difficult to formulate rules for the identification of relevant objects to be extracted from images. This difficulty is partly a result of the multitude of factors influencing the image acquisition, namely, instrumentation tuning and precision, sampling, resolution, visual perspective, and lighting. All of these factors introduce noise in the visual rendering of pictorial objects which modify their morphological and geometric characteristics. Further, objects from a natural scene show a high degree of variation in their characteristics. For example, while it might be easy to define a set of rules that identify a pattern of pixels representing a circle, the task is much more difficult to define a set of rules to detect a pattern of pixels representing a tree. This increased difficulty necessitates the adoption of image analysis systems based on the general similarity of a known object, as opposed to an exact match of a known object.
A typical image analysis system first identifies and extracts objects from an image and then represents their relations. Spatial entities can be represented in many complimentary ways depending on the task requirements. For example, the same object may be represented by the chain code of its contour, by the minimum rectangle enclosing it, by a set of rectangles covering its area, or by related graphs.
Once the image analysis system has represented the object, the objects and spatial relations from the image are classified, i.e. associated with real object features, and described according to the observer's interest. Image classification is not unique in that the same pictorial entity can be classified to different real objects. For example, a circular shape can be interpreted as a wheel, a ball, or a disk. Whether this level of semantic discrimination is necessary depends on the informative context. Although image classification and derived indexing methods are not unique, they can be effective for specific applications where the pictorial entities are well-defined. However, general indexing for images is much harder and as yet an unsolved problem.
FIG. 1 shows a typical document management system 10 in which a user 20 formulates his information retrieval request 12 as a query 14 in a query language. The query 14 is received by a matching system 16 that matches it against documents in a document database 18. Documents containing relevant data are retrieved and forwarded to the user 20.
The primary goal of the document management system 10 is to easily, efficiently, and effectively retrieve from the database 18 documents relevant to a certain user's need. This requires the system to have a meaningful indexing scheme for all documents. In the case of images, a meaningful indexing scheme means that the extracted information from an image should be related to the represented pictorial entities (objects), to their characteristics, and their relations.
The indices representing image content may be a textual string obtained by manual annotation or by an automatic analysis module. In the latter case, many of the approaches to indexing require pattern recognition techniques.
The automatic analysis of image content requires the design of efficient and reliable segmentation procedures. In applications such as mechanical blueprints, there are features that are exactly defined and easily recognizable. In contrast, natural images have few features that are easily identifiable. Accordingly, present algorithms are only capable of effectively dealing with limited classes of images. In particular, they work with a small number of non-overlapping objects on an easily identifiable and separable background, and in general require knowledge of the lighting conditions, of the acquisition devices, and of the object context and its features.
One analysis technique used to extract information from an image is to perform interactive segmentation by providing semi-automatic object outlining. The user assists the system by indicating with a pointer or box the exterior contour of the object of interest. Alternatively, the system may use edge pixels having a high color gradient (not necessarily identifying the complete contour of an object) which are matched with known edge patterns from a database. In either case, the outline of the object must be identified for the system. In particular, this requires a closed loop area and not merely a general region of the image where the object is located.
There exist many automatic techniques for analyzing pictorial images to extract relevant information therefrom. Some of the techniques may be grouped as color histograms, texture identification, shape identification, and spatial relations. The color histogram technique determines the predominant colors. For example, a predominant green color may be a lawn or forest, and a predominant blue color may be an ocean (if within the lower portion of the image) or a sky (if within the upper portion of the image).
The texture extract technique is used to extract relevant information from an image based on the texture of the image which is normally its frequency content. Typically, the frequency content of the image is obtained from its power spectrum density which is computed by a Fourier transform. The texture pattern is matched against known texture patterns to identify objects.
The shape identification technique is used to extract relevant information from an image. Shape identification typically uses either a function identifying a closed loop contour of an object or a closed loop edge identification of an image, and therefore matching the closed loop contour or edge to known objects. This technique may be used, for example, to identify faces which are generally round. Unfortunately, it is difficult to distinguish between features with similar shapes, such as distinguishing faces from clocks.
The spatial relations technique is used to extract relevant information to match a pattern. Such a spatial relation may be, for example, a tank within the image.
Any of the aforementioned techniques may be used in combination and further may include a prediction of where to expect to find particular features. For example, the document management system may expect to locate circular faces on the upper center portion of the image, and may expect to locate blue sky on the upper portion of the image.
The aforementioned systems are mechanical in nature and require mathematical mechanistic processing of each image to extract information that is then compared to a large number of possibilities in order to identify image content. While it is possible to supplement the aforementioned mechanistic system with the assistance of a person identifying closed loop outlines of images, or identifying the nature of the image with textual entries, this becomes a burdensome task, especially if a large number of images are involved. Further for complex images, these techniques often result in poor results because the specific element of interest in the image may not be a dominant contributor to the overall color, texture, shape, and spatial relations.
What is desired, therefore, is a technique for image identification that increases the likelihood of identifying the content of an image while reducing the processing required for such identification.