1. Field of the Invention
The present invention relates to the configurations of a data storage system for use as an external storage device of a computer, and a data storage control apparatus, and more particularly, a data storage system having the combinations and connections of units so as to configure the data storage system connecting a multiplicity of disk devices with high performance and flexibility, and a data storage control apparatus.
2. Description of the Related Art
In recent years, as a variety of data has been structured of electronic form and handled by computers, the importance of a data storage apparatus (external storage apparatus) capable of storing a large amount of data with good efficiency and high reliability, independently of a host computer for processing the data, is increasing.
As such a data storage apparatus, a disk array apparatus constituted of a large quantity of disk devices (for example, magnetic disk drives and optical disk drives) and a disk controller for controlling the above large quantity of disk devices has been put into use. The disk array apparatus can control such the large quantity of disk devices by accepting disk access requests simultaneously from a plurality of host computers. In recent years, there has been provided a disk array apparatus capable of controlling disk devices of a few thousands or more in number, or in terms of storage capacity, a disk device group of a few hundred terabytes or more, by use of a single disk array apparatus.
Such a disk array apparatus incorporates a memory that plays the role of a disk cache. With this, it becomes possible to reduce a data access time required when a read request or a write request is received from a host computer, making it possible to obtain high performance.
In general, the disk array apparatus is constituted of a plurality of kinds of major units, namely, channel adaptors provided in the connection portion to the host computer, disk adaptors provided in the connection portion to disk drives, control units taking charge of controlling the cache memory and the disk array apparatus as a whole, and a large quantity of disk drives.
FIG. 15 shows a configuration diagram of a disk array apparatus 100 according to a first conventional example. As shown in FIG. 15, the conventional disk array apparatus 100 is structured of a plurality of major units, including control managers (shown as CM in the figure) 10 each having a cache memory and a cache control unit, channel adaptors (shown as CA in the figure) 11 for interfacing with host computers (not shown in the figure), disk enclosures 12 each having a plurality of disk drives, and disk adaptors (shown as DA in the figure) 13 for interfacing with the above disk enclosures 12.
Furthermore, there are provided routers (shown as RT in the figure) 14 for interconnecting among the control managers 10, the channel adaptors 11, and the disk adaptors 13 to perform data transfer and communication among such the major units.
There are provided four control managers 10 in the above disk array apparatus 100. Also, the four routers 14 are provided corresponding to the control managers 10. The above control managers 10 and routers 14 are interconnected in one-to-one correspondence. By this, the connections between the plurality of control managers 10 become redundant, so as to increase availability (for example, the Japanese Unexamined Patent Publication No. 2001-256003).
Namely, in the event that one of the routers 14 becomes faulty, the connections among the plurality of control managers 10 can be secured by passing data through another router 14. Thus, the disk array apparatus 100 can continue normal operation even in such a case.
Further, in the above disk array apparatus 100, two channel adaptors 11 and two disk adaptors 13 are connected to each router 14. The channel adaptors 11 and disk adaptors 13 can communicate with any control managers 10 through the interconnections between the control managers 10 and routers 14.
Also, the channel adaptors 11 are connected to host computers (not shown) processing data, through, for example, Fibre Channel and Ethernet (registered trademark). Meanwhile, the disk adaptors 13 are connected to the disk enclosures 12 (typically, a group of the disk drives) through, for example, a Fiber Channel cable.
Further, exchanges of a variety of types of information (for example, data mirroring processing among a plurality of cache memories), which maintains the consistency of the operation inside disk array apparatus 100 as well as user data from the host computers, are performed between the channel adaptors 11 and the control managers 10, as well as between the disk adaptors 13 and the control managers 10.
In the above disk array apparatus 100, a control manager taking charge of the cache function is assigned in advance on the basis of each address of each mounted disk. Therefore, on receipt of a disk access request from a host, first, it is necessary for the disk array apparatus 100 to perform operation for determining a control manager that takes charge of the requested address. Further, since the cache memory is structured of a volatile memory, it is necessary to perform mirroring, that is, storing the identical data into a cache memory of another control manager, to prepare for the occurrence of a fault.
Namely, in case of write operation from a host computer, data from the host computer are first received in the channel adaptor 11. The channel adaptor 11 inquires one control manager 10 which control manager 10 is taking charge of the disk requested from the host. Thereafter, the channel adaptor 11 writes the data into the cache memory provided in the control manager 10 in change. When the write operation is completed normally, the channel adaptor 11 sends a completion notification to the host computer.
Similarly, on receipt of a read request from a host computer, the channel adaptor 11 inquires one control manager 10 which control manager 10 is taking charge of the requested data. Thereafter, the channel adaptor 11 requests the control manager 10 in charge to send the read data.
The received control manager 10 immediately notifies the read data to the channel adapter 11, if the data of interest is existent in the cache memory. On the contrary, the data of interest is not existent in the cache memory, the control manager 10 requests the disk adaptor 13 to read out the data from the disk.
The disk adaptor 13 reads out the data from the disk, then the disk adaptor 13 writes the data into the cache memory of the control manager 10 in charge. In response to the above data write, the control manager 10 in charge notifies the channel adaptor 11 that it has become possible to read out the data. On receipt of the above notification, the channel adaptor 11 reads out the data from the cache memory, and then transfers the read data to the host computer.
FIG. 16 shows an explanation diagram of a second conventional technique. A disk array apparatus 102 shown. in FIG. 16 includes four (4) control managers (cache memories and control units) 10. Each control manager (CM) 10 is connected to the channel adaptor (CA) 11 and the disk adaptor (DA) 13.
Further, four control managers 10 are interconnected by a pair of routers 14 so as to enable communication among each other. The channel adaptor 11 is connected to a host computer(s) (not shown) through Fibre Channel or Ethernet (registered trademark). Also, the disk adaptor 13 is connected to the disk drives in the disk enclosure 12, through, for example, Fiber Channel cables.
Further, the disk enclosure 12 has two ports (for example, Fiber Channel ports) connected to the different disk adaptors 13. With this, redundancy is provided in the configuration, so as to increase fault tolerance.
Through the above routers 14, exchanges of a variety of types of information (for example, data mirroring processing among a plurality of cache memories) are performed so as to maintain the consistency of the operation inside disk array unit 102.
In the above second conventional example, the channel adaptor 11 receives the write data from the host computer, and transfers the write data to the control manager 10 under connection. On receipt of the write data, the control manager 10 confirms a control manager 10 in charge, and if the data-received control manager is taking charge, the control manager 10 of interest notifies the channel adaptor 11 that data write processing has been completed. Meanwhile, if another control manager 10 is taking charge of the relevant data, the data is transferred to the other control manager 10 in charge, and the completion of data processing is notified to the channel adaptor 11. On receipt of the notification from the control manager 10 in charge, the channel adaptor 11 sends a write completion notification to the host.
In case of receiving a read request from the host computer also, first, the channel adaptor 11 issues a request to the control manager 10 under connection. On receipt of the above request, the control manager 10 confirms the control manager in charge. If the request-received control manager is taking charge, the control manager 10 of interest either extracts the data from the cache memory, or reads out the data from a disk via the disk adaptor 13, and then transfers the readout data to the channel adaptor 11.
On the other hand, in case another control manager 10 is taking charge, a request is sent to the relevant control manager 10 in charge. The control manager 10 in charge then transfers to the channel adaptor 11 the returned data through the read operation similar to the above description. The channel adaptor 11 then transfers the data received from the control manager 10 to the host computer.
Through spread of electronic data having been promoted in recent years, there are demands on a data storage system of larger capacity and higher speed. In each of the storage units shown in the above-mentioned two conventional examples, high availability and flexibility have been attained. However, in some aspects, it is insufficient to support a plurality of host interface types.
Namely, the support types are different depending on the difference in protocols and throughputs of the host interface. For example, for Fibre Channel and iSCSI (Internet Small Computer System Interface), which are interfaces for so-called open system host computers like a UNIX (registered trademark) server or an IA (Internet Appliance) server, a high throughput of 200 MB/s or more is required. In contrast, in FICON (registered trademark) and ESCON (registered trademark), which are the interfaces for mainframe host computers, it is sufficient if the throughput of 20 MB/s to 200 MB/s or of that order is provided.
Also, there is a difference in the response time expected by the hosts. In case of the open system host, after a request is transmitted, the connection to a storage device is once disconnected, and in the meantime, other processing is performed. On the other hand, in case of the mainframe host, a series of processing from first request transmission, data transfer to status reception is performed in most cases, through which the connection to a storage device is maintained. Accordingly, the mainframe host requires a short response time for one data transfer.
In case that such a plurality of host interfaces having different types of protocols or throughputs are to be supported, according to the configuration using the first conventional technique, a bottleneck of throughput is apt to be produced in the router, because the whole paths between the channel adaptors to the control managers, between the disk adaptors and the control managers, and among the respective control managers pass through the router. In short, it can be said that, by such a configuration, it is hard to provide the channel adaptor with a sufficient throughput.
Further, according to the configuration using the second conventional technique, a throughput problem does not occur because buses connecting between the channel adaptors and the control managers, between the disk adaptors and the control managers, and among the respective control managers are entirely independent. However, there are some cases that are hard to satisfy a required response speed for the host.
The above situation will be described below, taking an exemplary case of producing a substantially slow response speed. Consider a case of rewriting a portion of data on the disk by a data from a host. Since the disk data are protected by check codes given on the basis of a certain unit of data, when a portion of the data is to be rewritten, it is necessary to generate the check code a fresh, using the remaining portion of the unit of data, as well as the exact portion to be rewritten. If the remaining data does not exist in the cache, readout operation from the disk becomes necessary in spite of write processing, which takes a substantially long time to respond.
In particular, according to the second conventional technique, there are cases that the control manager taking charge of data requested from a host is not connected to the channel adaptor having received the request from the host. In such cases, the response time becomes still longer. In the following description, for the sake of explanation, a control manager to which a channel adaptor having received a request from a host is connected is referred to as CM-R (receive-CM), a control manager taking charge of the data concerned is referred to as CM-M (master-CM), and a control manager having mirror data of the cache data is referred to as CM-S (slave-CM).
(1) The channel adaptor 11 receives a write data from a host.
(2) In order to generate a check code, the disk adaptor 13 reads out the remaining data from the disk.
(3) The disk adaptor 13 writes the data into the control manager CM-M.
(4) The control manager CM-M transfers the data to the control manager CM-R.
(5) The channel adaptor 11 writes the data into the control manager CM-R.
(6) The control manager CM-R generates a check code corresponding to the new data, and transfers the generated check code to both control managers CM-M, CM-S.
Further, according to the aforementioned second conventional disk array apparatus, in order to increase the capacity and/or the speed, when an additional set(s) of control manager 10, channel adaptor 11 and/or disk adaptor 13 are to be installed, it is necessary to increase the number of ports of the disk enclosure 12, and also increase the number of connection cables between the disk adaptor 13 and the disk enclosure 12.
When increasing the number of ports of the disk enclosure 12, the number of cables corresponding to the number of the disk adaptors connected to one disk enclosure is to be increased also. This requires a larger mounting space, and therefore brings about a larger device size. Further, since two path systems for one disk enclosure are sufficient in view of a redundant configuration, it is not recommendable to increase the number of ports. Furthermore, since the number of connected disk adaptors is not constant but variable depending on a user's request, if a large number of ports are increased, it becomes wasteful against a small number of disk adaptors. On the other hand, if a small number of ports are increased, it becomes not possible to cope with a large number of disk adaptors. Namely, versatility is lost.
Meanwhile, in the first conventional disk array unit, when configuring a large-scale disk array unit provided with a multiplicity of major units, the number of connection lines between control managers 10 and routers 14 abruptly increases. This produces complicated connection relation, making it difficult to mount physically.
For example, in the configuration shown in FIG. 15, as shown in FIG. 17, the disk array apparatus has a mounting structure such that four (4) control managers 10 and four (4) routers 14 are connected through a back panel 15. In this case, as described earlier, the number of signal lines becomes 4×4×[the number of signal lines per path], as shown in FIG. 15. For example, as described before, when one path connection is constituted of a 64-bit PCI (parallel bus), the number of signal lines including control lines on back panel 15 becomes approximately 1,600 (=4×4×100). In order to wire the above signal lines, the printed board for back panel 15 requires six signal layers.
In case of a larger scale configuration, for example, constituted of eight (8) control managers (four sheets) 10 and eight (8) routers (four sheets) 14 connected through back panel 15, the required number of signal lines reaches approximately 6,400 (=8×8×100). The printed board for back panel 15 in this case requires four times as many as the above, namely 24 layers. It is hard to realize.
In place of the 64-bit PCI bus, when assuming a case of connection through a 4-lane PCI-Express bus of reduced signal lines, the required number of signal lines becomes 1,024 (=8×8×16). As compared to a PCI bus of 66 MHz, the PCI-Express bus is a high-speedbus of 2.5 Gbps in speed. In order to maintain the signal quality of the high-speed bus, it is necessary to use an expensive material for the substrate.
Furthermore, when using a low-speedbus, it is possible to exchange among wiring layers using vias. On the other hand, use of the vias in the high-speedbus produces degraded signal quality, which is to be avoided. Accordingly, when using the high-speed bus, it is necessary to allocate the entire signal lines so as not to intersect mutually. As compared to the case of a low-speed bus having the same number of signal lines, the substrate is required approximately twice as many signal layers, for example twelve signal layers. Further, the substrate is to be structured of expensive material, which is also not realistic.
Moreover, in the first conventional disk array apparatus 100, if a fault occurs in one of routers 14, the channel adaptors 11 and the disk adaptors 13 connected in subordination to the failed router 14 become unavailable immediately when the fault occurs in the router 14.