The volume of documents in computer databases is rapidly expanding. In spite of this expansion, paper documents are still in wide use. As a result, it is generally useful to be able to convert the paper documents into a form that a computer may use to store or otherwise process the documents. A typical technique for this conversion process is to create a "document image," which is typically a bitmap representation of the paper document. This bitmap representation is in digital form. In particular, a bitmap representation is a matrix of digital values wherein each value represents black-and-white, grey scale, or color pixels arranged to form an image of the document. A computer converts the digital values into pixels that are displayed for a user on a display unit, such as a computer monitor. The combined effect of the pixels is to create a document image which is read by the user from the computer monitor.
Although a document image is an appropriate form for representing most, if not all, of the information on a paper document, e.g., words and pictures, this form is not generally appropriate for a computer to perform textual operations. An example of a textual operation is searching for documents that match certain terms or keywords of a query input by a user. A representation for a document which is more conducive for computer-implemented textual operations is a text code. In a text code, each letter of the document is encoded as an entity in a standard encoding format, e.g. ASCII. Since each letter is separately encoded, a search engine, for example, can efficiently examine the textual content of a document and determine whether the document matches a query.
The problem with a text code is that it does not represent non-alphabetic and non-numeric images, such as pictures, in the document. Generally, when a document is converted into a form for use with a computer, it is desirable that both text operations and image display can be performed on the document. Accordingly, some computer systems maintain both a document image and document text for each document stored in the system. The document text is used for textual operations, such as searching a database of document texts for search terms, but the corresponding document image of a matching document is displayed to the user, so that graphical information contained in the matching document is presented to the user as well as the textual information.
One disadvantage of conventional computer systems, however, is that it is difficult for the user to determine from the document image which parts of the document matched the search terms and whether the matching document is relevant to the user.