This invention relates to an external memory unit for a computer or high-performance computer system, and more particularly to an array disk system employing a large number of small disk drives and a maximum power supply current requirement control, e.g. with respect to power on or head seek.
In current computer systems, the data required by the host side, e.g., by the CPU (central processing unit), is stored in a secondary storage system and the data is written to and read from the secondary storage system as required by the CPU.
The increasing sophistication of information systems in recent years has led to a need for higher performance secondary storage systems. One answer to this need is the array disk system which, as will be clear from the following description, consists of a large number of relatively small capacity magnetic disk drives. The array disk system is used for conducting parallel processing. Specifically, the data transferred from the CPU is subdivided and simultaneously stared in a plurality of magnetic disk drives and, during data read, the subdivided data is simultaneously read from the magnetic disk drives regenerated to obtain the original data from the data read simultaneously from the disk drives and transferred to the CPU at high speed. The magnetic disk drives that carry out this parallel processing are divided into groups as indicated in FIG. 12(a). Each group constitutes a unit within which all member magnetic disk drivers operate in the same manner.
The secondary storage system generally uses nonvolatile storage media, typically magnetic disk drives, optical disk drives or the like.
This type of array disk system is discussed, for example, by D. Patterson, G. Gibson and R. H. Kartz in a paper titled A Case for Redundant Arrays of Inexpensive Disks (RAID) read at the ACM SIGMOD Conference, Chicago, Ill., (June 1988). This paper reports on the results of studies into the performance and reliability of both array disk systems which subdivide and process data parallely and array disk systems which independently treat distributed data. The two array disk systems referred to in this paper are considered to be the most common types in use today.
The array disk system which subdivides data and processes the subdivided data parallely will now be explained. The array disk system has a large number of relative small capacity magnetic disk drives. As shown in FIG. 14, the data transferred from the CPU is subdivided and simultaneously stored in parallel in a plurality of data disk drives 7 and a parity disk drive 8 that constitute a parity group 4. During data read, the procedure is reversed, i.e., the subdivided data is simultaneously read in parallel from the disk drives regenerated to obtain the original data from the data read simultaneously from the disk drives and transferred to the CPU. This parallel processing enables the data to be transferred at high speed. For enhancing the reliability of the array disk system, parity data is generated from the subdivided data and stored in the parity disk drive P (8). In this way, when a problem arises making it impossible to read data from one of the magnetic data disk drives D (7) among those in which the subdivided data is stored, the data stored in the disabled magnetic disk drive can be reconstructed from the data stored in the remaining magnetic disk drives 7 and the parity data of disk drive 8. The provision of parity disks is necessary for improving the reliability of a system which, like the array disk system, consists of a large number of magnetic disk drives.
Systems in which a high transfer rate is realized by simultaneously conducting reading and writing with respect to any array of disks are disclosed in Japanese Unexamined Patent Public, Disclosure 1(1989)-250158 and Electronic Design, Nov. 12, 1987, p. 45. As shown in FIG. 2, these types of systems define a plurality of disk drives 211-215 as an array. Preferably a rotation synchronize circuit 220 rotation-synchronizes these disk drives with respect to an external reference clock or with respect to one disk drive among the plurality of disks making up the array. A sequencer 240 subdivides the data transferred from the host 210 through an interface 230 into bits, bytes, blocks or some other arbitrary unit, and also generates parity or other such EC, (error checking and correction) data. These data are written to the disk drives 211-214 substantially simultaneously by disk drive control circuits 250. During regeneration, the sequencer 240 reconstructs the original data from data read simultaneously from the disk drives and outputs the regenerated data to the host through the interface 230. The buffer 260 is situated between the control circuits 250 and the sequencer 240 for absorbing rotational discrepancies among the disks. The interface 230, sequencer 240, control circuits 250 and buffers 260 are controlled by a processor 270.
When reading and writing of data are conducted with respect to N=1 disks (+1 indicating the parity disk 215) in this manner, the apparent transfer rate becomes N times the transfer rate of the individual disk drives. Moreover, the provision of a redundant disk (the parity disk 215 in this example) makes it possible to ensure accurate data regeneration even if one disk drive should break down.
Further, as shown in FIG. 3, COMPCON 189 Spring, February 1989, p118 discloses an arrangement in which a plurality of interconnected disk drive arrays 281-284 (which will be referred to as parity groups) are each constituted in the manner of FIG. 2. High-speed transfer is realized by having the disks within the parity groups 281-284 simultaneously conduct read and write operations. When a disk within a group breaks down, the data is reconstructed within the group concerned. This reference further discloses the formation of separate groups 291-295 (which will be referred to as power groups) constituted perpendicular to the parity groups. Each power group constitutes a separate unit as regards the supply of electric power for toe disk drives and the cooling fans. This arrangement limits the effect of the breakdown of a single power group to making it impossible to read the data of only one disk in each parity group. As a result, the aforesaid data error checking and restoration capability remains intact and the data can be regenerated.
The aforesaid arrangements do not, however, take into account the fact that the initial current becomes large when the large number of disk drives are simultaneously started up. As shown in FIG. 4, the power supply current required immediately after start-up of a disk drive is more than twice that during steady state operation. This large current following start-up continues to flow for no more than several tens of seconds. Assume that a single power supply serves D number of disk drives (D being equal to the number of parity groups), that the steady state current value is I(A), and that a current equal to k times the steady state current is required immediately after start-up. The power supply is thus required to be capable of supplying, albeit for only a short period, a current of Ixc3x97kxc3x97D (A).
Japanese Unexamined Patent Publication Disclosure 57(1982)-3265 discloses a technique for staggering the times at which power-on is conducted with respect to the disk drives. While this method makes it possible to reduce the required capacity of the power supply, it considerably prolongs the time required for start-up of the entire system when applied to a system which, like the array disk system, has a large number of disk drives that have to be supplied with power.
An object of this invention is to provide an array disk system and control the same to reduce the amount of electric current required by the array disk system, e.g. the amount of electric current required thereby during a power-on sequence for the disk system which enables the disk system to be started up within a prescribed period of time using relatively small power supplies.
For achieving this object, the present invention divides the disk drives within the disk system into a number of groups and separately starts up the respective disk drive groups.
The number of disk drives constituting the individual groups ordinarily decreases in the order that the groups are started up. This is because, for example, the reserve power of the power supply after the start-up of the first group is equal to the rated capacity of the power supply minus the amount of current required for maintaining the disk drives of the first group in the steady state. It suffices to set the number of disk drives in the first group to be started up so as not to exceed the capacity of the power supply being used. This number can be decided by the following method.
Assume that D disk drives are started up using a single power supply, that the steady current per disk drive in the steady state is I(A), and that an initial current k times as large as the steady state current is required at the time of start-up. Then, if the number of disk drives first started up is set at D/k, the current at the time of start-up will not exceed the amount of current when all of the disk drives are operating in the steady state, namely, will not exceed ID(A).
Next, the manner for determining the number of disks to be included in the second and following groups to be started up will be explained. Basically, it suffices if the number of disk drives in the second and following groups to be started up is such that the amount of current required for starting up the disk drives does not exceed the reserve capacity of the power supply. For optimum effect, however, the following method can be considered. After the first group of D/k disk drives have reached the steady state (e.g., after several tens of seconds), the next group of disk drives is started up. It then basically suffices to set the number x of disk drives in this next or second group as the number obtained by dividing the reserve current capacity of the power supply when D/k disk drives are operating in steady state by kI. This can be expressed by the following equation:
x=1/k(1xe2x88x921/k)D
Since only an integral number of disk drives is possible, any decimal amount in the value of D/k is dropped, i.e., the value obtained from the foregoing equation is rounded down. When this method is used for determining the numbers of disk drives, it may happen that a single disk drive remains at the end. For starting up this disk drive, however, a maximum power supply current of I(Dxe2x88x921+k) (A) is sufficient.
One disk drive of a parity group is sometimes designated as a master disk and subjected to rotation synchronization. In such case, this master disk has to be started up prior to the other disks. If the number of master disk drives is such that they can all be started up simultaneously, therefore, the master disk drives are included in the first group to be started up. Alternatively, it is possible to start up the master disk drives one by one before starting up the other disks.
Since the disk drive groups are started up at different times to prevent overlap of the initial currents, the maximum current output of the power supply can be reduced. Since the disk drives are organized into a number of groups, the disk system can be started up within a prescribed period of time.
An example magnetic disk drive of a type illustrated herein requires a maximum current of 4.5 A, which breaks down to 1 A for rotating the disks, 2.8 A for seek operation and 0.7 A for other purposes. When seek operation occurs simultaneously with parallel processing in an array disk system consisting of a large number of such disk drives, a very large current becomes necessary. Moreover, as protection against power outages or other such mishaps that might occur during the operation of such an array disk system, it is necessary to provide battery backup for enabling data in the course of storage to be completely stored. For supply of such a large amount of current, it is necessary to use a very large battery.
An object of this invention is to provide an array disk system and control the same to reduce the amount of electric current required by the array disk system, particularly the amount of electric current required thereby during seek operation, and also in this way to reduce the capacity required of a battery provided as a backup power source for use during power outages and the like.
For achieving the aforesaid object, the present invention provides an array disk system, as shown, for example, in FIGS. 12(a), (b), and (c) that has a large number of disk drives divided into a plurality of groups provided with control such that the timing of the start of seek operations for moving the read/write heads to change the track positions at which the read/write heads are located is varied among at least some of the groups and such that, within each group, the timing of the star of seek operations is the same for all of the disk drives or is varied among at least some of the disk drives.
The control for causing the seek operation start timing to vary among the groups or among the disk drives of a group can be provided by rotation-synchronizing the disk drives such that the positions of indices provided on the disks as references for the start of data read/write are offset among the groups or among the disk drives.
In this case, parallel processing can be readily conducted by providing the controller with data processing which simultaneously stores the subdivided data simultaneously transferred to the respective groups in buffer memories within the respective groups and conducts read/write processing of the data from the buffers in accordance with the positional offset of the indices.
Alternatively, the control for causing the timing of the start of the seek operations to vary among the groups or among the disk drives of a group can be provided, as shown in FIGS. 18(a) and (b) for example, by deliberately offsetting the seek operation start timing among the groups or among the disk drives, without offsetting the positions of the indices on the disks. Since all of the indices are positionally aligned in this case, there is the advantage that rotation synchronization control is easy to conduct.
Further, the control for causing the timing of the start of the seek operations to differ among the groups or among the disk drives of a group can be provided, as shown in FIGS. 19(a) and (b) for example, by varying the head addresses for the start of data reading and writing among the groups or among the disk drives, without offsetting the positions of the indices on the disks among the groups. This method amplifies the control since the head addresses can easily be varied among the groups by software techniques.
The control used by the invention for achieving the aforesaid objects is further characterized in that the seek operations for moving the read/write heads to change the track positions at which the heads are positioned are prevented from occurring simultaneously in at least some of the disk drives.
For preventing seek operations from occurring simultaneously the control will offset the position of the indices on the disks to vary the seek operation start timing or vary the head addresses for the start of reading and writing.
In preventing seek operations from occurring simultaneously, it is preferable from the point of reducing electric power consumption to divide the large number of disk drives into group units, each of a plurality of the disk drives, to prevent seek operation from occurring simultaneously among the groups, and to make the division of the disk drives into groups such that the seek operations occur in different groups at different times within the period of one disk revolution and all of the seek operations occurring at different times are completed within the same period.
In a disk system which conducts parallel processing, the positional relationship among the heads situated over the disks is generally such that the many disk drives making up the system operate as if they were an integrated unit. Specifically, the disks are rotation-synchronized with each other and the heads operate such that their track position relationships are all the same. In such a system, if the many disk drives which conduct parallel processing are divided into a number of groups and each group is treated as a separate read/write unit, the time for conducting seek operation is offset among the groups so that the occurrence of a large seek current by the simultaneous occurrence of the many seek currents in the individual disks can be avoided. Therefore, the supply of current to the array disk system as a whole is lowered and the capacity required of a battery for providing backup power during power outages and the like can be reduced.
Offsetting the positions of the indices on the disks among the groups makes it possible to offset among the groups the timing at which seek operation starts for data exchange between the heads and the tracks during one revolution and, thereby, to hold the seek current to a low level.
As explained above, a prescribed seek time is required within each revolution for conducing a seek operation. During this time, the disk continues to rotate irrespective of whether or not data is being exchanged. It is thus preferable to make effective use of this period during which data is not being processed for carrying out the seek operation separately in each group. If this expedient is adopted, then, by deliberately offsetting the timing at which the seek operation is conducted among the groups within this period, it becomes possible, without offsetting the positions of the indices to use this period to good advantage and thus to reduce the seek current.
Since in one and the same disk drive the seek operation is conducted after the head at a specific head address (e.g., the bottommost head in FIG. 2) has completed data exchange with a track on the disk, changing the head address at which data read/write is started among the groups changes the timing at which seek operation is conducted among the groups, so that the seek current can be reduced.
Up to this point, the explanation has been directed to the case where the seek operation timing is varied among the groups. It is, however, similarly possible to reduce the seek current by varying the seek operation timing among disk drives in one and the same group according to the above teachings.
Since reducing the seek current reduces the amount of electric power that has to be supplied to the array disk system as a whole, it decreases the capacity required of the backup battery for providing power during power outages and the like, increases the reliability of system operation during such emergencies, and enables the equipment for supplying power to be made more compact.