The present invention relates to data compression. In particular, the present invention relates to a system, structure and procedure for using a direct access storage device for sequential data access with optional embedded hardware data compression.
Random access data storage devices (random access devices) are sometimes used to archive data in a compressed format using software data compression techniques. Random access devices include, for example, hard disk drives, disk drives with removable storage medium, (removable cartridges), magneto optical drives, near field optical storage disks, and diskettes. Use of hardware data compression techniques to archive data offer a number of benefits as compared to use of software data compression techniques. For example, hardware data compression techniques typically operate faster than software compression techniques. Additionally, hardware data compression is transparent, meaning that it operates without any outside intervention, by for example, another software program or an end user. In contrast, software data compression (often implemented as file compression utilities) does not operate transparently, but rather requires explicit software program or end user intervention to compress and decompress data.
Because data storage systems typically include a number of random access devices, and because of the inexpensive nature of random access devices, it would be beneficial to implement hardware data compression techniques on random access devices. Such a known data storage system, for example, is a Redundant Array of Inexpensive Disks, or a RAID. Unfortunately, because of the nature of data access in random access devices, hardware data compression cannot be used to compress and decompress data on random access devices. Therefore, random access devices are not able to realize the benefits of hardware data compression.
To understand why this is the case, a brief discussion of the nature of data access on a random access device is presented. Random access devices are direct access devices, meaning that data can be read from and written to any location, or address on a random access device. To read data from, or write data to a random access device, two items of information must be supplied through the use of a programmatic interface, a begin address and a data transfer length. (Such programmatic interfaces are known, for example, SCSI read and write commands).
The begin address specifies a particular address on the physical medium of the storage device of where to begin reading or writing data. The data transfer length indicates the number of bits of data that are to be read from, or written to the device. Because a random access device allows direct read and write access to any location on its physical medium, and because the size of compressed data can vary depending on the amount of redundancy in the data, use of hardware compression on a random access device presents a significant problem, where data that was previously written on the device can be erroneously overwritten and corrupted.
For example, a random access data storage device driver, or driver receives a command to write 100 blocks of data A to a direct access device starting at location X. The data is compressed using hardware compression, and compresses to a length of 70 blocks. These 70 blocks of compressed data are then written by the driver to the device at location X. The driver receives another command to write some number of blocks of data B to the device starting at location A+100. This data is then compressed to some length, and written by the driver to the device at location A+70 (recall that data A was compressed from 100 blocks to 70 blocks). The driver receives yet another command to write 100 blocks of data to the device starting at location A. This data is then compressed. However, in this instance, the data only compresses to the size of 80 blocks. In this instance, because the recompressed 80 blocks of data will not fit into location A (location A includes 70 blocks of storage space), when the driver writes the recompressed data to location A, 10 blocks of data B (previously stored at location A+70) will be erroneously overwritten and corrupted.
Sequential access data storage devices (sequential access devices), for example SCSI tape drives, are also used to archive data in a compressed format. Sequential access devices typically use embedded hardware data compression to compress data before it is stored onto the device""s physical medium. In contrast to how data is written and read from a random access device, data stored on a sequential access device is organized as a linear sequence of data blocks, wherein a set of data blocks are respectively read or written in a sequential manner.
Sequential access devices allow data to be written onto the physical medium only at two locations, the beginning of the recording medium and at the end of a last, previously written data block on the recording medium. If data is written to the beginning of the medium, any data that was previously written to the medium will become unavailable. Otherwise, data is always written to the medium at a location that immediately follows the last block of data that was previously written to the medium. Using these procedures, the location on the medium where data is to be written will always empty, and writing data to the medium will not corrupt any previously written data. Therefore, hardware compression can be used to increase storage space on sequential access devices, because the after compression data does not need to fit into a predetermined number of bits on the device""s physical storage medium.
Because random access devices allow direct read and write access to data locations on its physical medium, hardware data compression on the random access device. What is needed is an apparatus and procedure for using hardware compression on a random access device, such that the benefits of hardware data compression can be realized in the random access device.
In one embodiment, a method of the invention is performed on a disk drive that includes a controller that is connected to a physical media. The controller includes a processor that is connected a controller memory. The disk drive is responsive to communication from an external computer. In response to communication from the computer, the processor respectively performs I/O to/from the physical media using sequential data access techniques.
In yet another embodiment, the invention is a system that includes, a disk drive, and a host computer connected to the disk drive. The disk drive is responsive to communication from the computer. The disk drive includes a disk drive memory and a controller coupled to the disk drive memory. The controller includes a processor a controller memory connected to the processor. The controller memory includes a set of computer program instructions and data to write data to the disk drive memory as a linear sequence of data bytes in response to a write data command from the computer.
The invention include other embodiments, for example, a disk drive and a computer program product for performing sequential data access in response to commands from a computer. In yet other embodiments, the system, apparatus, method and computer program product use hardware data compression to compress data before it is written to a disk drive, and to decompress data before compressed data is returned to a computer.