High volume, nonvolatile data storage devices such as magnetic or optical disks, tapes, PCMCIA cards, or the like, are commonly used with computer systems to store large quantities of read accessed data. Studies of computer systems accessing data from such mass storage media have established that data caching can materially improve the computer system performance, especially where the bulk of the data being processed is read in large units from the storage media, such as is common with CD video processing. Furthermore, studies have established that the larger the cache size the better the performance improvement. Such system level performance effects are somewhat expected given that the data storage media operate at average data transfer rates anywhere from 1 to 3 orders of magnitude slower than the capabilities of the processors.
An ideal mass storage media cache is adequate in size to receive and store multiple, relatively large, units of data, each unit being composed of tens of thousand of bytes. Though the transfer of data from the mass storage media to such a cache is accomplished in large units, it occurs at a relatively slow average data read rate. In contrast, the reading of data from the cache is conventionally performed using smaller units, but at faster data transfer rates. Given that a relatively large cache is needed to materially improve computer system performance, and the cost of DRAM cache is often a significant part of the storage media cost, cache size is often compromised to provide a lower mass storage media component cost. A particular example is the highly competitive magnetic hard disk drive marketplace, where there is constant contention between reducing the mass storage media component cost and improving its data transfer rate performance.
Though data compression techniques have been utilized to reduce memory size in various data processing applications, lossless data compression has not been applied to mass storage media cache systems. In part, this is attributable to the compressibility variations experienced with lossless data compression, as affected by the data content. Because the compressibility of data units vary unpredictably, cache storage allocation and deallocation control, as well as the high data compression and decompression speeds required, have proven to be major obstacles. Therefore, though the concept of compressing data for storage in a cache is a recognized desirable design objective, the realization of the result in an efficient and fast system has heretofore been lacking.
Accordingly, there is a need for a system and method by which data retrieved from a mass storage media can be compressed, stored in a fast and efficiently managed cache, randomly accessed from the cache, and timely decompressed for transmission to the data processing system.