In many archival systems, documents are archived based on particular attributes of the documents. On computers, folders are often used to create a set of documents that fall in a particular category. For example, a document generated on a computer could be archived in a folder bearing the name of the client for whom it was generated with the further attribute of case number stored in the document name. Those attribute values can later used in order to retrieve the document from its storage location by executing a search for a file with a name including the case number or by searching for a folder with the client name.
A problem with archiving based on a particular set of attributes arises, however, when a document or document set that has a particular attribute is desired and the documents are not archived based on the desired attribute, making retrieval difficult and potentially expensive. This problem is exacerbated when a document is not desired for a particular attribute of the document but is instead desired based on the similarity between the attributes of the archived document and the attributes of another document. For example, if a particular form is used during a transaction, a user may wish to retrieve all forms having the same format. Since the documents may not have been archived based on the document format, the user would have to retrieve each document and individually compare it to the desired form.
The problem is even more difficult when both paper and electronic documents are involved. At the present time, electronic documents may be located based on a predefined set of attributes or properties stored by the computer. In Windows 95.RTM., an operating system that runs on IBM-compatible personal computers, electronic documents may be searched based on values such as the date the document was modified, the size of the document, or simple text searches for words in the document. The limitations of this system, however, are that the attributes used in the search can include only those attributes that the computer has stored as part of the document properties. These limitations are evident when a search is performed in order to locate documents similar to another document.
A further problem with standard electronic archival systems is the inability to integrate the attributes of paper documents that have been converted into a digital format and the previously stored electronic documents. When a document is scanned into the computer, the computer can generate a list of associated properties only through either user input, such as an entry form that can be filled in by the user, or by creating artificial properties of the document, such as using the date on which the document was scanned as the date of creation.
What is needed, then, is a system and method for archiving both scanned paper documents and electronic documents based on attribute values located in the documents such that the documents can later be retrieved based on those attributes. What is further needed is a system and method for locating archived documents based on the similarities between the archived document and a paper or electronic document.