The use of digital input scanners, which can successively scan a set of sheets and record the images thereon as digital data, is becoming common in the office context, such as in digital copiers and electronic archiving. In any situation in which digital image data is created and accumulated in memory and/or transmitted through communication channels, it is desirable that the memory and the communication bandwidth are used in an efficient way. In particular, it would be good to avoid filling up the memory and the communication channels with redundant data.
A common type of input scanning in an office context is scanning and recording images from forms, slide presentations, or other documents in which multiple page images share a common “template” or appearance cues, such as logos, letterheads, PowerPoint templates and so forth. In a typical slide presentation, there is a standard template slide design, including for instance a logo and a border. Among the slides only the text changes in some slides, and in other slides the interior may include a graphic, table or spread sheet. The present embodiment is directed toward a technique for efficiently recording such documents in memory with image indexes for easier retrieval later.
In an office environment, image store and recall is becoming increasingly important feature where in scanned documents are stored in the multifunction device's storage disk for later retrieval. This allows multiple users to store their jobs in the Multifunction Devices (MFD), which can be retrieved later either by themselves or by other users. As more paper documents become digitized and stored, the ability to search through them by content has become very important. Optical character recognition (OCR) has had many advances over the years, making searching for a string of text simpler and more accurate. But there is a growing need for having image-based searching and retrieval techniques in today's multifunction devices. Many times searching by text is not enough, most of the documents stored in a MFD are usually in image format, and a system to provide a condensed list of documents possibly containing same image is needed.
It would be desirable to have a method of indexing stored documents and images that would facilitate easy retrieval at a later time.