FIG. 1 attached to the present specification illustrates in simplified fashion the general architecture of a data processing system. The data processing system 1 comprises a central processing unit CPU, controllers Ctl, through Ctl.sub.n, and one or more peripheral subsystems S/SP. These subsystems can comprise, for example, single disks, redundant arrays of disks, magnetic tapes, or even printers.
The redundant arrays of disks can have various architectures, including the architectures known by the well-known acronym "RAID" (for "Redundant Array of Independent Disks").
The arrays of disks with a "RAID" architecture are in turn subdivided into several subcategories. Among others, it is possible to cite the architectures "RAID-1" and "RAID-5," though this is not exhaustive.
To begin with, let us briefly summarize the main characteristics of these two architectures, which are used to advantage within the scope of the invention.
To obtain redundancy of the "RAID-1" type, mirrored disks are used. According to this method, the data are recorded normally on a first disk, and redundantly on a second disk, physically distinct from the first one, which represents the "mirror" of the first one. When a "normal" disk is malfunctioning, the data can he read- and/or write-accessed from its "mirror" disk. This naturally requires doubling the storage capacity, and hence the number of physical disks, relative to what is actually needed.
Redundancy of the "RAID-5" type requires less additional storage capacity. The data is divided into segments of several blocks of a given length, which can be called "usable" data blocks. A redundant segment composed of parity blocks is associated with a given number of segments.
According to this method, several physical disks are also used. In general, the disks are partitioned into "slices" and a "rotating parity" data recording schema is used.
FIG. 2a attached to the present specification illustrates this recording method with "RAID-5" type redundancy and rotating parity. By way of example, it is assumed that the data storage subsystem comprises five physical disks D.sub.1 through D.sub.5 under the control of a single controller Ctl, for example equipped with interfaces of the type known by the name "SCSI" for "Small Computer System Interface," SCSI.sub.1 through SCSI.sub.5. The controller Ctl also comprises an interface of the same type SCSI.sub.0 connected to the central processor (not represented). The bottom part of FIG. 2a represents the logical configuration of the memory, with storage equivalent to the five disks D.sub.1 through D.sub.5. Each group of disks D.sub.1 through D.sub.5 is called a physical device PD.
This array is divided into y slices, t.sub.1 through t.sub.y. It is assumed that only one segment is recorded in any slice of a disk, for example the segment S.sub.0 ("Seg. 0") in the slice t.sub.1 of the disk D.sub.2. If one parity segment P.sub.1 (stored in the slice t.sub.1 of the disk D.sub.1) is associated with four segments of usable data, S.sub.0 through S.sub.4, it is easy to see that there is a shift of the storage position of the next parity segment P.sub.2 : naturally, the latter is stored in the slice t.sub.2 (in the example described), but on the disk D.sub.2, not the disk D.sub.1. A regular shift also exists for the segments P.sub.3 through P.sub.5, respectively stored on the disks D.sub.3 through D.sub.5. Once again, a parity segment P.sub.6 is stored on the disk D.sub.1, in the slice t.sub.6. Therefore, there is a shift of modulo 5 and more generally of modulo d, d being the number of physical disks, and s=d-1 being the number of segments of usable data associated with a parity segment.
For the example illustrated by FIG. 2a, the map of the distribution among the disks D.sub.1 through D.sub.5 and the slices t.sub.1 through t.sub.y of the usable data segments (S.sub.0 ="Seg. S.sub.0 " through S.sub.x+3 ="Seg. S.sub.x+3 ") and the parity segments ("Parity P.sub.0 " through "Parity P.sub.y ") is shown in "TABLE 1," located at the end of the present specification.
Naturally, other distribution schemata are possible, but if rotating parity recording is desired, the ratio between the accumulated number of usable data segments and the number of associated parity segments and the number of physical disks cannot be just any ratio.
As defined above, the data storage space constitutes a de facto virtual memory space or logic unit LUN.
This virtual memory space can be further subdivided into several logic units LUN.sub.0 through LUN.sub.2, as is illustrated more particularly by FIG. 2b. Each logic unit LUN.sub.0 through LUN.sub.2 comprises a certain number of slices, the total number of segments (usable data and parity data) being equal to the number of slices multiplied by the number of physical disks. In the example described in FIG. 2b, it is assumed that the virtual disk array has been divided into three logic units LUN.sub.0 through LUN.sub.2. In other words, a "vertical" partitioning (by physical disks) has been replaced by a "horizontal" partitioning, for a given physical device PD. The number of partitions is chosen so as to obtain optimized performance as a function of the capacity of the elementary disks D.sub.1 through D.sub.5, and hence of their accumulated capacity (physical device PD). In the example described, each logic unit LUN.sub.0 through LUN.sub.2 forms a virtual disk with a capacity equal to one third of the accumulated capacity, i.e., of the capacity of the physical device PD.
The use of a redundant disk architecture for data makes it possible to solve only some of the problems caused by hardware failures. In fact, although not represented in the preceding figures, the disks or arrays of disks are placed under the control of at least one disk controller. In the event of a failure of this unit, access to all or part of the information would be seriously compromised. It has been proposed that redundancy also be introduced at this level, as illustrated in the detail in FIG. 3a. According to this architecture, the multiple disk array, with the common reference MD, is placed under the control of two disk controllers, in this case two data storage processors SP-A and SP-B, operating redundantly. The multiple disk array MD can comprise one or more physical units PD (FIGS. 2a and 2b), and hence, a fortiori, one or more logic units (FIG. 2b: LUN.sub.0 through LUN.sub.2). Normally, some of the disk space, and hence some of the logic units, (a priori half, or a value approximating half, as a function of the redundant architecture adopted), are assigned to one of the data storage processors, for example SP-A, and the rest to the other processor, for example SP-B.
In the normal operating mode, access to the first partition of the total disk space is gained via the processor SP-A, and access to the second partition of the disk space is gained via the processor SP-B. If a logic unit LUN#m (m being arbitrary and falling between 0 and n, n+1 being the maximum number of logic units) is assigned to SP-A, it is necessary to organize a redundancy of access to the resource LUN#m through the processor SP-B in the event of a failure of the processor SP-A. However, many types of processors available on the market would not make it possible to "see" the logic unit LUN#m directly through the processor SP-B.
Ordinarily, two methods are used, which will be explained in reference to FIGS. 3b and 3c.
The first method is illustrated schematically by FIG. 3b. The input-output controller Ctl of the central processing unit CPU communicates through separate busses B.sub.0 and B.sub.1 with the processors SP-A and SP-B, under the control of the operating system OS of the central processing unit CPU. The operating system OS can be of any type. For example, it can be a "UNIX" or GCOS"(registered trademarks) operating system. In the normal mode, access to the logic unit LUN#m assigned to the processor SP-A takes place via the bus B.sub.0 (the solid lines in FIG. 3b). When a (software or hardware) failure that is inhibiting access to this logic unit LUN#m via the bus B.sub.0 and the processor SP-A is detected, at least the logic unit LUN#m, or possibly all of the logic units assigned to the processor SP-A, are switched to the processor SP-B. Access is then gained via the bus B.sub.1 (in broken lines), and the operation moves into the "degraded" mode. To do this, a command generally known as a "trespass," meaning a "forced assignment," is used. Naturally, the process for organizing the switching of the logic units assigned to the processor SP-B to the processor SP-A is entirely similar.
This method has the drawback of returning to the highest software "layers," i.e., to the level of the operating system OS of the central processing unit CPU. This results in a probable overload of the latter. It is even necessary to modify the code of some operating systems in order to be able to handle such specific tasks.
The second method is illustrated schematically by FIG. 3c. Only one bus B linking the controller Ctl to the redundant processors SP-A and SP-B is used. In the event that a failure is detected, a programmed function known as an "auto-trespass" is used, which organizes the automatic switching from one processor to the other.
This method, although it frees up the operating system OS of the central processing unit CPU, is nevertheless not without drawbacks. The fact that only one bus is available results in an overload, even in the normal mode. Moreover, only one physical channel is available between the control Ctl and the storage processors SP-A and SP-B.
The object of the invention is to eliminate the drawbacks of the processes of the prior art, while maintaining, or even improving, the redundancy of access to the data storage resources. In particular, its object is to provide an access to the redundant data storage resources that is transparent for the main system.
To this end, the control and detection of failures in the system input-output controller is made to "descend," which masks these tasks from the operating system. The physical architecture adopted is similar to that described in connection with FIG. 3b. In particular, two separate busses are used, one of which serves as the "normal" transmission path for a first disk controller, the other serving as a redundant (i.e., backup) transmission path for the other disk controller (the latter also being used redundantly), and vice versa.