1. Field of the Invention
The present invention relates to data storage systems, and in particular, to a method and apparatus for storing and retrieving data in hard disk data sectors of non-standard sizes.
2. Description of the Related Art
The ability to manage massive amounts of information in large scale databases has become of increasing importance in recent years. As businesses begin to rely more heavily on large scale database management systems, the consequences of hardware-related data losses intensify, and the security, reliability, and availability of those systems becomes paramount.
One way to increase the security, reliability and availability of data stored in large databases is to employ a technology known as a redundant array of inexpensive disks, or RAID. This technique is described in the paper xe2x80x9cA Case for Redundant Array of Inexpensive Disks (RAID),xe2x80x9d by David A. Patterson, Garth Gibson, and Randy H. Katz, and given at the ACM Sigmod Conferencce 1988, pages 109-116 (1988), which is herein incorporated by reference.
At least five RAID xe2x80x9clevelsxe2x80x9d have been defined. RAID-0 writes data across the drives in the array, one segment at a time. This is also referred to as a xe2x80x9cstripedxe2x80x9d configuration. Striping offers high I/O rates since read and write operations may be performed simultaneously on multiple drives. RAID-0 does not increase reliability, since it does not provide for additional redundancy.
RAID-1 writes data to two drives simultaneously. If one drive fails, data can still be retrieved from the other member of the RAID set. This technique is also known as xe2x80x9cmirroring.xe2x80x9d Mirroring is the most expensive RAID option, because it doubles the number of disks required, but it offers high reliability.
In RAID-2, each bit of a data word is written to a data disk drive, and its Hamming error correcting code (ECC) is recorded on an ECC disk. When the data is read, the ECC verifies the correct data or corrects single disk errors.
In RAID-3, the data block is striped and written on the data disk. Stripe parity is generated on writes, recorded on a parity disk, and checked on read operations. RAID-3 provides high read and write transfer rates, and a low ratio of parity disks, but can yield a transaction rate that does not exceed that of a single disk drive. The controller implementing a RAID-3 array may be implemented in hardware or software. Software RAID-3 controllers are difficult to implement, and hardware RAID-3 controllers are generally of medium complexity.
In RAID-4, each entire block is written on a data disk. Parity for blocks of the same rank are generated for data writes and recorded on a parity disk. The parity data is checked on read operations. RAID-4 provides a high read data transaction rate, but can require a complex controller design. RAID-4 arrays generally have a low write transaction rate and it can be difficult to rebuild data in the event of a disk failure.
In RAID-5, each data block is written on a data disk. Parity for blocks in the same rank is generated on write operations, and recorded in locations distributed among the storage disks. Parity is checked during read operations. RAID-S is similar to RAID-3, except that the parity data is spread across all drives in the array. RAID-5 offers high read transaction rates. Disk failures can compromise throughput, however, and RAID-5 controllers can be difficult to implement.
A RAID-5 array presents its storage to the user in terms of user sectors which are typically 512 bytes in size, and actually writes data on the surface of the storage disks to these user sectors. In many cases, a few bytes of control information are appended to the user sectors, but individual sectors are written to independently.
Unfortunately, this 512 byte size does not efficiently use the storage disk surface, thus reducing disk capacity from the theoretical ideal by ten percent or more. Further, limiting data sector sizes to 512 bytes has also negatively impacted the performance of computer operating systems.
In view of the foregoing, it can be seen that permitting larger sector sizes promises significant storage and throughput performance improvements. However increasing sector sizes can substantially increase the associated read/write overhead, thus decreasing read/write data throughput. For example, if a user writes a single 512 byte sector in the middle of a longer 4096 byte sector, the existing data in the sector must be read (4096 bytes), the new 512 byte data inserted, and the larger block written back to the disk. The read operation is the source of additional overhead which decreases performance. What is needed is a data storage system and method which can implement larger sector sizes in RAID architectures without unduly increasing associated read/write overhead. The present invention satisfies that need.
To address the requirements described above, the present invention discloses a method, apparatus, article of manufacture., and a memory structure for storing and retrieving data in physical sectors which are larger than the sector size presented to the user. The method comprises the steps of receiving at least one user sector comprising write data, and writing the user sector to a portion of the physical sector of the storage device. In one embodiment, data is written to the data storage disk using existing data that was read from the storage disk for a RAID parity calculation. In writing the data in this way, larger sector sizes can be implemented with no overhead penalty. To write user sectors to a portion of the physical sector of the storage device, user sectors are mapped to associated physical sectors. Existing data stored in the associated physical sectors is then read, the write data is merged with the existing data stored in the associated physical sector, and the merged data is written to the storage device. In one embodiment, the method further comprises the steps of computing a data delta from the existing data and the write data, computing a parity value for the write data from the data delta and a parity of the existing data, and writing the write data to the mapped physical sector.
The apparatus comprises a plurality of storage devices, each comprising media segmented into a plurality of physical sectors, operatively coupled to a controller. The controller manages the storage and retrieval of data in the storage devices and comprises an I/O module for writing user sectors to physical sectors. The article of manufacture comprises a data storage device tangibly embodying instructions to perform the method steps described above. The present invention also describes a memory for storing data in a RAID array of storage disks. The memory is structured into a plurality of physical sectors, typically 4096 bytes long, each of which is associable via a mapping with eight 512 byte user sectors. Data is written to the physical sectors using existing data read from a storage disk during the RAID parity calculation.
The foregoing allows writing data to disks in larger data sectors, thereby increasing the utilization of the disk media. This larger sector size is nominally 4096 bytes, but may be any multiple of the user-visible sector size.