Data compression is employed to increase the amount of data that can be stored on a mass storage medium, such as a disk, thereby decreasing the ratio of storage unit cost per unit of customer data. Incidently, the use of data compression may also improve the performance of the disk. It is expected that existing and future low cost hardware implementations of data compression algorithms will be attractive to many users, and that the use of data compression will become more widespread.
When data compression is used with disk-based products, conventional practice divides the data being compressed into equal size compression units. Typically, the compression/decompression hardware operates only upon full compression units. That is, it is not possible to retrieve (read) a particular data element from the compression unit unless all of the data preceding the data element in the compression unit is first decompressed by the compression/decompression hardware. When saving a data element, the entire compression unit containing that element is compressed by the compression/decompression before it is saved on the disk. As can be appreciated, when performing small reads (e.g., reads of data elements such as single records) that are considerably smaller than a compression unit, inefficiencies are experienced.
Furthermore, the compression factor (ratio) of a given compression unit is data dependent. As a result, two compression units (having the same uncompressed length) may yield compressed images of different length. In order to use all the disk space, compressed images are stored consecutively on the disk. Thus, to enable direct access, the location of a compression unit is stored in a directory having a size that is proportional to the number of compression units. For performance reasons the directory is preferably implemented with a fast (and relatively expensive) control memory. As a result, there is strong cost and performance incentive to make the compression units large.
However, one major performance drawback of data compression-oriented mass storage systems, in particular disk systems, is the required connection time and the response time for small read operations, such as record reads. Conventional approaches to solving or at least ameliorating this problem require additional control information, and therefore consume a significant amount of control memory space. If the control memory space is implemented with semiconductor memory then additional cost, packaging volume, and power consumption are experienced.
It is thus an object of this invention to provide a method that reduces connection time during record read operations while adding but a small amount of information to the directory, relative to that required when using smaller size compression units.
It is a further object of this invention to reduce both the connection time and the response time experienced during small read operations from compressed disk tracks, while not significantly increasing the amount of additional control information.