1. Technical Field
The present invention relates generally to image data, and more particularly, to a system and method for managing image storage size.
2. Related Art
With the Internet becoming an integral part of life, the ability to provide adequate data storage for image inventories is increasingly important. One exemplary industry where image inventory storage size is increasingly important is the United States banking industry. In this industry, digitized, compressed documents are initially stored in write-once media and archived for the legally required seven years. Documents to be imaged are generally created with strong (high contrast) information such as letters and correspondence with printed text and possibly a company logo. Some documents have background scenes, e.g., checks, but the vital information is usually printed in black or handwritten in blue or black ink. One common compression standard used is that promulgated by the Joint Photographic Experts Group (JPEG). Despite advanced compression techniques, a typical digital check record, which includes a header followed by compressed image segments of the front and back of the check, results in an average total record size of 40-50 kilobytes. Since there are approximately 80 billion checks written per year in the United States, the image inventory for seven years translates into 23,000-28,000 trillion bytes of compressed data for a single copy. Even a smaller bank's portion of this data is large.
In addition to the archived version, many banks also provide online document images of, for example, the preceding three months, to allow quick access by commercial and individual customers. Hence, two copies of at least a portion of a bank's image inventory are often maintained, which creates shortages of data storage. One mechanism to reduce online image storage requirements and allow online document image selection is through the use of a visual index of thumbnails such as disclosed in U.S. Pat. No. 6,154,295 to Freuland et al. In this setting, the customer can order additional copies of data from the “index” print. However, both the high resolution image and the dimension-reduced thumbnail are later discarded.
Magnifying the data storage problem is that many industries increasingly want to provide access to imaged documents for larger periods of time. For example, in the banking industry, it is preferred to provide imaged documents online for at least the past fifteen and a half months (i.e., from January of one year to the subsequent year's April) for tax purposes. Unfortunately, the increased data storage requirements make this service difficult to provide.
One mechanism some banks use to provide three months worth of imaged documents online is providing a small amount of images online (e.g., one month's worth) and using batch processes to an archive version for older images. However, batch processing creates other problems. One problem is that a batch process can take long periods of time to complete. Since customers would like to be able to browse and do research quickly, batch processing for older imaged documents is unacceptable. For example, an item cleared six months ago can take up to a week or longer to retrieve. Moreover, a first inquiry some times does not lead to the correct item being retrieved. In addition, the banks oftentimes charge the customers a large service fee for its efforts to retrieve the image from the archive version. In summary, batch processing-based image retrieval is inefficient and slow, and is not an adequate remedy for reducing image inventory size.
Another potential remedy to storage requirements is to provide further data compression. However, since continuous-tone compression techniques (e.g., JPEG) are generally lossy processes, care must be taken less increased compression removes vital information. By “lossy” is meant that the decompressed image is not quite the same as the initial image. In some applications, such as in the banking industry, document information must be maintained. Lossless methods for reducing storage size of an image are available but require increasing the complexity of the lossless compression technique used such as transcoding from generic Huffman tables to custom Huffman tables (i.e., Huffman tables tuned for each image), and transcoding from Huffman entropy coding to arithmetic entropy coding.
Another banking industry objective that is hindered by image inventory storage requirements is the ability to use imaged documents in day-to-day clearing operations. Currently one type of clearing operation is completed by providing document images on a compact disk (CD) to commercial customers. This means that hundreds of CDs are mailed to commercial customers every day, which increases the bank's operational expenses. Consequently, commercial customers expenses are high for this essential service, since the commercial banks often confirm the checks are not fraudulent before authorizing payment.
Another problem related to image inventory data storage size is that of quick transmission of images. The larger the size of image(s), the longer it takes to transmit.
The above-described problems in the banking industry are also found in other industries where image inventories are used. Other exemplary industries include: photography developers, photographic news agencies, catalog shopping, other Internet-based activities, old books scanned by libraries, scanned ledgers, genealogy material, business records, and all incoming mail for paperless office environments.
In many of these industries, pages are scanned in and the primary interest is in the content rather than the presentation. In these cases, optical character recognition (OCR) may have been applied to the scanned images in order to capture as much of the critical information as possible and convert it into coded text such as ASCII characters so text search and data mining techniques can be applied. Unfortunately, the OCR error rate is still significant. Having easy access to the poorer quality, but still legible image of the original document would assist quickly settling accuracy questions.
In view of the foregoing, there is a need in the art for a system and method for managing an image data storage size, such as those provided online, to reduce storage requirements, increase transmission speed, and meet customer requirements.