1. Technical Field
The present invention relates generally to a storage management system for a document image database, and more particularly to a method of managing storage in a document image database using document analysis to partition documents into logical regions and document reduction means for reducing storage size of the regions according to various storage preference rules.
2. Discussion
Storage management is a central issue in document image database systems. Users are expressing an increasing interest in being able to save and retrieve documents in image form. However, despite the growing size of hard disks and removable media, current storage capacity in document image database systems is inadequate for supporting a paperless office. To illustrate the problem, a standard 81/2.times.11 page (with 1 inch margins on all sides) scanned at 300 dpi would measure 1,950.times.2,700=5,265,000 pixels. In grayscale, each pixel requires one byte to represent, and thus would require approximately 5 megabytes of storage. The scanned page in 24-bit color would require 15 megabytes and in bitonal would require 658,125 bytes to store. Hence, 1,000 similar scanned pages could require between 600 megabytes and 15 gigabytes to store in a document image database. Since the average office contains far more than 1,000 pages, good techniques are needed for effectively managing storage in a document image database.
Within a document image database, there is a classic trade-off between the quality of a document image and the size of its stored data file. Generally, a high quality representation of the document requires more space to store. To maintain an acceptable level of quality, requires a document image database with excessive capacity. By reducing the storage requirements for the lesser important parts of documents, storage capacity may be reduced while maintaining the high quality of the important aspects of documents. Typically, storage management begins by scanning every document at the same predetermined depth and resolution, such that the minimal acceptable settings required to maintain image quality in a particular document are applied to all documents. Scanning each part of every document at the same depth and resolution irrespective of its contents requires excessive storage space. Furthermore, once entered into the system, the storage size of a document is not further analyzed for possible reduction. Alternatively, storage management may begin with a system user manually specifying the scanning depth and resolution for each document entered into the system. In this way, the scanning parameters and image representation details can be specialized to each situation, but only at the high cost of unnecessary user intervention. Moreover, these manual storage management strategies are only applied at the document level and only at the time a document is inputted into the system.
Accordingly, a need exists for an efficient method for managing storage within a document image database. Sophisticated document analysis and storage management techniques should be used to decrease the size of the document image database while maintaining a high level of quality in document image. It is further desirable that the document analysis methods automatically locate and identify regions in a scanned document image. Different storage management techniques can then be applied to each region, and thus reducing the overall size of a stored document image while maintaining the quality of important regions within a document. Over time, documents can be reanalyzed and storage management techniques can be reapplied to further reduce storage size of previously stored document images.