Today's businesses rely heavily on paper for many of their daily functions. For instance, most corporate information resides in paper documents. Also, the majority of transactions necessitate either updating existing paper documents or creating new ones. This dependence on paper will continue to characterize businesses for some time to come. For this reason, businesses are always looking for new and efficient means to handle paper documents in order to be able to respond rapidly to events and to cut down on cost.
Currently, manual operations continue to be the method of choice for processing paper documents. In general, a human operator first identifies the document and routes it appropriately. The document may then go through several stations before its processing is judged to be complete. At the end of the cycle, the document is typically archived in a storage filing cabinet according to some preset procedure. If at any later time this same document is needed again, a human operator retrieves it and the cycle starts over. Slow retrieval time, high probability of erroneous filing, and excessive cost associated with the storage space are known to be the major drawbacks of this approach.
The need for efficient methods to process paper documents is not new to the business community. In fact, this need has evolved over the last ten to fifteen years. In the past, businesses spoke of the need for better data management as a way to control the information that flows in and out of an organization. Currently, businesses speak of the need for better document management techniques instead. In the context of paper documents, this is taken to mean the need for more advanced methods to automate the handling of paper documents within an enterprise.
Approaches that attempt to address this problem are collectively referred to as document imaging systems. The basic function of a document imaging system is to convert the paper document into an image bitmap. This image bitmap, rather than the paper copy, is then stored in the system. Other functions may include document identification, attachment of a user identifying information, extraction of either partial or full text from the image, attachment of indexing information, attachment of tracking information, filing into a specific folder, routing over the network, archiving in a specific location, and retrieval.
Document imaging systems aim at providing greater efficiency, better ability of reuse, a reduction of product cycle time, and significant savings. However, this technology is still in its infancy and has been slow to deliver in its promise. A major hurdle has been that these systems are very difficult to fully automate. Human operators are still needed to identify and organize documents before they can be scanned into the system. This operation is time consuming and can reduce or eliminate the intended savings. Also, human operators are needed to enter the necessary keywords by which scanned documents can be retrieved. Manual entry of keywords is both slow and cumbersome, which impacts negatively on the overall efficiency of the system. Additional manual operations may also be needed to perform other tasks such as attachment of tracking information, filing into a specific folder, routing over the network, and archiving in a specific location. Manual functions limit the response time of the overall system as well as increase cost.
Optical Character Recognition technology has made it possible to automate the entry of keywords for the purpose of retrieving documents. It does so by converting the text in the image of the document to ASCII or other character code. In this case, any word in the extracted ASCII text can then be used to search for the document in question. This solution does not, however, address some rather common business needs. For instance, typical businesses process several classes of documents at any given day. In some situations, it may be desired to attach a different list of keywords to each different class of documents. This list may be used alone or in addition to the text extracted from the image. The list of special keywords may include the type of the document, the user ID, the owner of the document, the folder where the document is stored, and, perhaps, some other attributes that are relevant only to the class of documents to which they are attached. In other situations, one may wish to extract only keywords from a limited set of fields in the scanned document. In both of these cases, Optical Character Recognition alone is not sufficient.
Cover sheets or forms based methods have been proposed to deal with the problem of identifying documents at scan time. These same approaches have also attempted to resolve other tasks such as attaching tracking information, filing into a specific folder, routing over the network, and archiving in a specific location. Existing solutions are however, very limited, document specific, and not easy to generalize. Another issue inherent to document imaging systems is the limited amount of resources available for storing document images. This problem is exacerbated when duplicative images of documents are stored after documents are mistakenly input in the system multiple times. Therefore, there is a need for a file storage and retrieval system which allows any user to enter documents into the system and have the correct actions performed upon the document, and which alerts the user upon recognizing duplicative documents, to allow the user to delete duplicative images to conserve storage space.