The present invention relates to document imaging systems for storing and retrieving images of scanned documents. Specifically, the invention relates to the batch processing of raw image data which has been scanned into the memory of a processing system.
Document scanning to create electronic media from paper documents and microfilmed records is conducted on a high-speed, high-volume production basis. Electronic media offers advantages over print media in storage space efficiency, preservation. retrieval and text searching. High-speed scanners can rapidly scan 10,000 documents, creating a stored pixel-based image of each side of each scanned page. Each image is identified by an ASCII index file record which contains various indicia relating to the image. The index file record is created at the time of scanning and contains address links to respective document images. Personnel conducting a scanning operation include bar code data or other readable indicia on each page being imaged. Additionally, break pages may be included which identify the beginning and end of a document, folder or box. This information is inserted in the ASCII index file by the image scanner and is used to locate the linked image.
The Kodak Image Link scanning system produces an ASCII text file containing the readable indicia for each scanned page including the inserted break pages. The image data for each scanned image comprises a pixel array which is transferred with the related ASCII index file over a local area network for storage in a database in accordance with a protocol specified by the manufacturer of the scanner controller.
The images of the document residing in the data base of the processing system are accessed from information contained in the ASCII index file. A computer operator networked to the database controller can call up and search tile ASCII index file for a record or records which correspond to the image(s) which are to be retrieved for viewing on a retrieval device or for export to other devices which utilize the image data such as a CD ROM storage.
The stored image data may require further processing to correct for scanninig anomalies, such as image skew and image defects. Image skew can be corrected by rotating the pixel array, whereas poor image quality may require the document to be rescanned into the system.
The use of the stored images in text search and retrieval applications is limited by the content of the ASCII index files. For enhanced search and retrieval capabilities, the ASCII files must be augmented with information which is searchable. Improved searching capabilities may be realized if an OCR file is made of the image data and linked to the respective image. This permits full text search capabilities of the OCR version of the image.
The retrieval system which corrects the image errors and augments the index ASCII file does so on an image by image basis. A work station operator must call up each index file record and corresponding image on a per image/per index file record basis. The workload represented by these post scanning processes reduces the overall system throughput.
Batch processing of the stored image data and index file record can organize images and index file records so that only stored information which relates to a specific job is queued up for these additional processes. The capability of providing multiple computer work stations for performing different tasks on the same image records and index records also aids throughput. System throughput suffers when the images are subjected to tasks which are executed in sequence, requiring workstations which execute some tasks to wait until the earlier tasks of the sequence are executed.
A remaining problem in the document retrieval system is providing a format for the index file which is compatible with output devices of various users. Each manufacturer of these devices, has their own input format requirements. Thus, a transfer of information from the processed records to the input file of the retrieval device requires the ability to map records of one format to records of an input file having a different format.