The present invention relates generally to data processing systems, and more particularly to a method that distributes multiple copies of data across multiple disk drives of a storage system for improved and parallel access to that data by multiple processors.
There are many factors that can operate against optimum performance of a data processing system. One such factor stems from the relative disparity between the time is takes to perform a data access (e.g., read or write) of a peripheral storage of a data processing system and the operating speed of a data processor making that access. This disparity is made more evident with today""s penchant for clustered systems in which most, if not all, of the multiple processors of the system compete for access to the available data storage systems. Unfortunately, the storage systems in these and other multiple processor environments tend to form a bottleneck when being accessed by several of the processors of the system at the same time. The problem is worse with poor storage system design that makes it difficult for the storage system to handle multiple, simultaneous input/output (I/O) requests, severely impacting system performance. In addition, poor storage system design can create an environment that gives rise to possible irreparable loss of data.
Among prior solutions are those used using data redundancy to both backup the data, protecting against loss, and to allow parallel access for improving system performance. Such solutions include redundant arrays of independent (or inexpensive) disks (RAID). There are various RAID configurations or levels, some using data striping (spreading out blocks of each file across multiple disks) and error correction techniques for data protection, but redundancy is not used. Thus, although these RAID configurations will tend to improve performance, they do not deliver fault tolerance. However, data redundancy is used by a RAID level (RAID1) that employs disk mirroring, thereby providing redundancy of data and fault tolerance RAID1 is a well known technology to increase the I/O performance. Typically the disk mirroring employed by RAID1 incorporates a group of several disk drives, but provides a single disk drive image to servers.
Storage systems employing a RAID1 architecture will usually limit read/write outside accesses to a master disk drive. When an I/O write request is received by a RAID1 storage system, the data of the request is written to the master disk. A disk controller of the storage system will then handle replication of that data by writing it to all of the mirrored disks. The end result is that each and every disk of the storage system will have the same data.
When An I/O read request is received, a disk selector module, typically found in the disk controller, will select one of the mirrored disks to read in order to balance the loads across the disk drives of the system. A disk controller is capable of reading data from multiple disk units in parallel. This is why the disk mirroring increases the performance of data read operations.
But this technology has at least two problems. First, processor elements of the system can be subjected to high loads which restricts the number of I/O requests which the disk controller can process in a period of time. Second, when an I/O write request is received by the storage device, the requesting system element (e.g., a processor) must wait for a response until the disk controller writes the data to all the disk drives. This can introduce latency in data write operations.
Broadly, the present invention relates to a method of allocating each of a number of processor units to a corresponding one of a number of disk storage units. In this way, each processor unit can read data from its allocated disk storage unit with minimum conflict to other read and/or write operations conducted at or about the same time by other processor units. Multiple, simultaneous accesses for data will not create or encounter a bottleneck. In addition, the redundancy produced by this approach provides a storage system with fault tolerance.
The invention, then, is directed to a processing system that includes a number of processor elements connected to disk storage having a plurality of disk storage units for maintaining data. One of the processor elements, designated a xe2x80x9cMount Manager,xe2x80x9d is responsible for assigning a disk storage unit to a corresponding one of the other processor elements so that, preferably, there is a one-to-one correspondence between a disk storage unit and a processor element. One of the disk storage units is designated a master disk unit, and the remaining disk storage units are designated xe2x80x9cmirroredxe2x80x9d disk units. A disk controller of the storage system controls the writing to and reading from the disk storage units. The disk controller receives I/O write requests from the processor elements to write the data of that request only to the master disk unit. A sync daemon running on the disk controller copies the written data to the mirrored disk units. Each of the processor elements issue I/O read request to, and read data from, the mirrored disk unit assigned to it by the Mount Manager. If, however, the I/O read request is issued before the allocated mirrored disk unit has been updated with data recently written to the master disk unit, the requested data will be read from the master disk unit. To detect such a situation, the disk controller and the sync daemon use a bitmap status table that indicates which disk block in each mirrored disk drive has a stale data or updated data.
In an alternate embodiment of the invention the mirrored disks are not updated immediately. Rather, data written to the mirrored disks are fixed as of that point in time they are updated. Changes to that data on the master disk unit are not immediately written to update the mirrored disks until a processor element issues a xe2x80x9cSNAPSHOTxe2x80x9d request to the storage system. At that time the sync daemon of the disk controller will determine which data needs to be written to the mirrored disk units for updating, and identify them. Then, the sync daemon will update those mirrored disk storage units needing updating. In addition, when data is proposed to be written to the master disk unit, the disk controller first checks to see of the data that will be overwritten has been copied to the mirrored disk units. If not, the data that will be over-written is first copied to the mirrored disk units before being changed.
A number of advantages are achieved by the present invention. First is that by providing redundant data by mirroring the content of the master disk unit and assigning specific ones of the mirrored disk units to corresponding ones of the processor elements, parallel read accesses may be made, thereby improving system operation.
These and other advantages of the present invention will become apparent to those skilled in this art upon a reading of the following description of the specific embodiments of the invention, which should be taken in conjunction with the accompanying drawings.