In an office setting, it is typical to find many documents, both in paper and electronic form. These documents may contain important information relating to business, government or individuals and, as such, it is important to be able to securely store and reliably retrieve this information.
In the normal course of business, a large number of documents are generated each day. The information in these documents is, most likely, needed at a later time. It is sometimes difficult to find the correct document in an office setting because documents are sometimes misfiled, misplaced or simply lost. In some cases, no system of organizing documents is used, which likely results in the inaccessibility of documents in a timely manner.
Digital technology has been introduced to help manage this problem. Digital scanners, fax machines and the like allow a user to create an electronic version of documents which would otherwise be stored only on paper. This allows vast numbers of documents to be stored in a relatively small space using a personal computer.
The personal computer contains a disk drive, and the documents are stored as files on this disk drive. The personal computer has the option of introducing a system of directories to contain documents using a hierarchical file system on the computer. Such hierarchical file systems are common to personal computer operating systems such as Windows, a product of Microsoft Corporation and also Linux which is available for free.
The hierarchical method for organizing documents requires ongoing organization work to be performed. It is sometimes impractical for busy people to fastidiously organize their documents. Thus, even though documents are stored on a disk drive in a personal computer, they are often still inaccessible in a timely manner.
One product that has appeared to address the problem of automatically indexing and providing access to documents is the Ricoh eCabinet. This device receives documents via standard protocols such as SMTP or FTP which are widely used and well known to those skilled in the art. Each document is indexed using a text indexing engine. It is often necessary to perform some preprocessing of the document, such as OCR, before the document is ready to be indexed. The eCabinet performs this function when needed.
When a user desires to locate a document, he is able to communicate with the eCabinet using a web based interface. He then enters information regarding the content of the document he desires, or other information regarding the metadata of the document. Upon entering this information, the user submits the information by clicking a button presented in the user interface. The information is transmitted to the eCabinet as a “query.” The eCabinet receives the query and processes it using its internal database. The results of the query are compiled into a result set within the eCabinet. This result set contains pointers to the desired documents in the database.
The eCabinet then prepares a web page to be presented to the user that contains the results of the query. Sometimes the results of the query are too numerous to present on one web page, so buttons are provided to navigate through multiple web pages that present the result.
The user of the eCabinet may decide to perform actions on documents that he finds in the web page(s). The user selects document icons presented in the web page and specifies a desired action. An action specification which contains the selections made by the user is sent to the eCabinet which receives the action specification and performs the action. Such actions include copying a document, changing the summary, adding some information such as keywords or hierarchy path to aid in locating the document later, or deleting the document.
As the eCabinet receives more and more documents, its internal database gradually grows larger. Although it is able to recover disk space by moving documents to other storage, such as an NFS server, the nature of the database is that it must remain intact in order to allow the user to locate documents by their content and metadata. Thus, the data base gradually grows to consume all available disk space.
Another problem occurs when the user(s) submit information to the eCabinet too quickly. The eCabinet becomes swamped with incoming work. In some cases, the rate at which the eCabinet can consume documents is overmatched by the workload. Thus, the eCabinet can never catch up and falls further behind with the passage of time. This presents an additional problem because documents are processed in queues and therefore the documents submitted may take a long time to be processed while waiting for previous documents to be processed. Additionally, the eCabinet's performance is somewhat degraded when it is performing the task of OCR and or document indexing.
One may attempt to solve these workload problems by introducing multiple eCabinets. This method works well for ingesting the documents faster, although it requires the user to specify which eCabinet to which to send the documents, although he may be completely ignorant of which eCabinet(s) are busy and which are idle.
Another problem with multiple eCabinets is that when one is searching for a document he is often in a hurry. Thus it is bothersome to have to interrogate multiple eCabinets, especially if there are a large number of them.
Of course, it is natural for different departments in an organization to purchase and operate separate eCabinets. Sometimes, it is desirable for a high level manager or other person with a general query to interrogate the eCabinets in these different departments. Thus, the problem of interrogating multiple eCabinets is not restricted to the case where the workload was too great for one eCabinet; multiple eCabinets are a natural occurrence in a large organization.
Accordingly, what is needed is a method, apparatus and system for distributing queries and actions that do not suffer from the problems listed prior.