In a storage system, when a plurality of devices such as an HDD (Hard Disk Drive) compatible with SAS (Serial Attached SCSI (Small Computer System Interface)) is mounted, a SAS expander (hereinafter referred to as an expander) is often used.
FIG. 10 is a diagram illustrating a configuration example of a storage system that includes a SAS initiator (hereinafter referred to as an initiator) 200, expanders 300-1 and 300-2, and a plurality of SAS drives (hereinafter simply referred to as drives) 400-1 to 400-6.
As illustrated in FIG. 10, the storage system includes a storage apparatus 100 that includes a controller enclosure (hereinafter referred to as a CE) 100a and a drive enclosure (hereinafter referred to as a DE) 100b, and a host computer 500.
The host computer 500 is a device that issues a data access request such as requests to write and read data to and from the SAS drive 400 included in the storage apparatus 100. An information processing apparatus such as a server is an example of the host computer 500.
The CE 100a performs control associated with the data access request issued from the host computer 500 to the drive 400, and includes the initiator 200, the expander 300-1, and the drives 400-1 to 400-3.
The DE 100b includes the expander 300-2 and the drives 400-4 to 400-6.
In the following description, the expanders 300-1 and 300-2 will be denoted by reference sign 300 when they are not distinguished from each other, and the drives 400-1 to 400-6 will be denoted by reference sign 400 when they are not distinguished from each other. Further, the drives 400-1 to 400-6 are sometimes referred to as drives A to F, respectively, using the reference signs in FIG. 10. The reference signs A to F represent the SAS addresses of the drives 400, respectively.
The initiator 200 is a module that manages the subordinate drives 400 and allocates an appropriate drive 400 to the data access request from the host computer 500.
Moreover, the initiator 200 accesses an access target drive 400 via the subordinate expander 300-1 or the expanders 300-1 and 300-2.
The expander 300-1 is a SAS switch for expanding a SAS connection, and includes a SAS control device and a Phy which is a SAS physical port.
Moreover, the expander 300-1 is connected to the initiator 200, the drives 400-1 to 400-3, and the expander 300-2 in the subordinate DE 120 via the Phy and transmits information such as commands and data associated with the access between the initiator 200 and the access target drive 400.
The expander 300-2 has the same configuration as the expander 300-1. The expander 300-2 is connected to the superordinate expander 300-1 and the drives 400-4 to 400-6 via a Phy (a physical port) (not illustrated) and transmits information associated with the access between the initiator 200 and the access target drive 400.
Next, an example of the sequence associated with writing and reading of data from the initiator 200 to the drive 400 will be described with reference to FIGS. 11 and 12.
FIG. 11 is a diagram for explaining an example of the sequence associated with a write command in the SAS protocol, and FIG. 12 is a diagram for explaining an example of the sequence of a read command in the SAS protocol.
In FIGS. 11 and 12, “INIT” represents the initiator 200, and “TARG” represents the access target drive 400. Moreover, arrows denoted by broken lines represent primitives, and arrows denoted by solid lines represent frames. For the sake of simple description, FIGS. 11 and 12 illustrate the command sequence of a case (a P to P case) where the initiator 200 and the access target drive 400 are connected without via the expander 300.
First, a write sequence will be described. As illustrated in FIG. 11, the initiator 200 issues OPEN_Address_Frame (hereinafter referred to as OAF) to the drive 400. If the drive 400 accepts the frame, the drive 400 returns OPEN_ACCEPT and connection is established (see (a) in FIG. 11).
Subsequently, the initiator 200 issues Wt_CMD (write command) to the drive 400. When Wt_CMD is properly received, the drive 400 returns ACK (see (b) in FIG. 11).
Moreover, the initiator 200 and the drive 400 exchange DONE and CLOSE, and the connection is closed (see (c) in FIG. 11). Hereinafter, the sequence (c) will be collectively referred to as a “Close process” as indicated by arrows denoted by one-dot chain lines.
Subsequently, when the drive 400 issues OAF to the initiator 200, the initiator 200 returns OPEN_ACCEPT to the drive 400, and connection is established (see (d) in FIG. 11).
Further, the drive 400 issues XFER_Rdy to the initiator 200 and closes the connection (see (e) in FIG. 11). The initiator 200 determines a data amount of write data (Wt_DATA) that the drive 400 receives in the next connection according to the value of a burst length (or a burst size which will be also referred to as a BS) included in the XFER_Rdy.
Moreover, when the initiator 200 issues the OAF and the connection is established (see (f) in FIG. 11), the initiator 200 embeds an amount of write data designated in the BS of the received XFER_Rdy in (e) in DATA_FRAME and sends the DATA_FRAME to the drive (see (g) in FIG. 11). The drive 400 returns ACK when the returned DATA_FRAME is proper, and returns RRDY when the drive 400 can receive the DATA_FRAME.
When transmission of the write data ends, the close process is performed to close the connection (see (h) in FIG. 11).
When there is write data which was not transmitted in (g), the drive 400 reissues XFER_Rdy and receives write data (see (i) in FIG. 11).
Upon receiving all items of write data, the drive 400 establishes the connection and sends Good_Status to the initiator 200, and the write sequence ends (see (j) in FIG. 11).
Next, a read sequence will be described. As illustrated in FIG. 12, when a connection is established as in (a) of FIG. 11, the initiator 200 issues Rd_CMD (read command) to the drive 400. When the Rd_CMD is properly received, the drive 400 returns ACK and closes the connection (see (k) in FIG. 12).
Subsequently, when the drive 400 issues OAF to the initiator 200, the initiator 200 returns OPEN_ACCEPT to the drive 400, and connection is established (see (l) in FIG. 12).
After the connection is established, the drive 400 transmits read data (Rd_DATA) to the initiator 200 (see (m) in FIG. 12). If it is not possible to transmit the data at a time, connection is reestablished, and the Rd_DATA is transmitted (see (n) in FIG. 12).
When it was possible for the drive 400 to transmit data properly, the drive 400 sends Good_Status to the initiator 200, and the read sequence ends (see (o) in FIG. 12).
The OAF described above includes AWT (Arbitration Wait Time). The AWT represents a wait period after the drive 400 requests a connection. When the expander 300 receives OAF from a plurality of drives 400, the expander 300 transmits the OAF having the largest AWT among the OAFs issued by the drives 400 preferentially to the initiator 200, and establishes the connection with the corresponding drive 400.
Hereinafter, a connection establishment sequence when a plurality of drives 400 issues OAF to the initiator 200 in a storage system will be described with reference to FIGS. 13 to 15.
FIGS. 13 to 15 are diagrams for explaining an example of a connection establishment sequence between the initiator 200 and the drive 400 in the storage system illustrated in FIG. 10.
In FIGS. 13 to 15, OAF is illustrated as OPEN FRAME.
The expander 300-2 receives OAF from subordinate drives D to F for a predetermined period. Hereinafter, this predetermined period will be referred to as an OAF reception period.
For example, it is assumed that the drives D and F have issued OAF to the expander 300-2 in order to establish a connection with the initiator 200 (see (1) in FIG. 13).
After the elapse of the OAF reception period, the expander 300-2 sends OAF having the greatest AWT among the OAFs received from the drives D and F to the expander 300-1. In this case, since the AWT of the drive D is the same as the AWT of the drive F, the expander 300-2 sends the OAF of the drive F, of which the SAS address has the larger value, to the expander 300-1 (see (2) in FIG. 13).
On the other hand, the expander 300-1 also receives OAF from the drives A to C and the subordinate expander 300-2 for a predetermined period (OAF reception period). After the elapse of the OAF reception period, the expander 300-1 sends the OAF having the greatest AWT received from the drive 400 to the initiator 200. In the example of FIG. 13, since the sent OAF is only the OAF of the drive F transmitted from the expander 300-2, the expander 300-1 transmits the OAF of the drive F to the initiator 200 (see (3) in FIG. 13).
Subsequently, as illustrated in (4) of FIG. 14, upon receiving the OAF, the initiator 200 transmits OPEN_ACCEPT to the drive F which is the issuer of the received OAF via the expanders 300-1 and 300-2.
Upon receiving the OPEN_ACCEPT, the drive F transmits a predetermined frame to the initiator 200 (see (5) in FIG. 14).
When all frames have been transmitted, the drive F exchanges DONE and CLOSE primitives with the initiator 200 and performs a close process (see (6) in FIG. 14).
Here, it is assumed that 200 μs has passed until the connection of the drive F ends in (6) of FIG. 14 after the drive D transmits OAF in (1) of FIG. 13.
Moreover, as illustrated in (7) of FIG. 15, it is assumed that the drives D and F issued OAF within another OAF reception period. In this case, the drive D sets a wait period (μs) from the OAF issued in (1) of FIG. 13 to the AWT and includes the wait period in the issued OAF. The respective drives 400 issue the OAF periodically when establishing connection.
After the elapse of the OAF reception period, the expander 300-2 sends the OAF having the greatest AWT received from the drives D and F to the expander 300-1. In this case, the AWT (200 μs) of the drive D is greater than the AWT (0 μs) of the drive F, the expander 300-2 sends the OAF of the drive D to the expander 300-1 (see (8) in FIG. 15).
Moreover, it is assumed that the expander 300-2 has issued the OAF of the drive D (see (8) in FIG. 15) and the OAF of the drive A to the expander 300-1 within the OAF reception period (see (9) in FIG. 15).
After the elapse of the OAF reception period, the expander 300-1 compares the AWT (200 μs) of the OAF received from the expander 300-2 and the AWT (0 μs) of the OAF of the drive A and issues the OAF having the greater AWT, that is, the OAF of the drive D issued from the expander 300-2, to the initiator 200 (see (10) in FIG. 15).
Moreover, upon receiving the OAF, the initiator 200 transmits OPEN_ACCEPT to the drive D which is the issuer of the received OAF via the expanders 300-1 and 300-2, and connection is established (see (11) in FIG. 15).    Patent Document 1: Japanese Laid-Open Patent Publication No. 2004-118546
However, as described above, in the storage system, in the write sequence according to the SAS protocol, before the write data is transmitted from the initiator 200 to the drive 400, a command (XFER_Rdy) is issued from a write target drive 400 (see (e) in FIG. 11).
The XFER_Rdy includes the BS that represents the size of the write data that the drive 400 receives in the connection, and the initiator 200 transmits the write data according to the BS.
Here, a case where the drive 400 is in the busy state, that is, the drive 400 issues a new OAF immediately after the connection close process, will be considered. In this case, when drives 400 having different burst lengths (BS) are present together in the storage system, there is a problem in that the performance of the drive 400 having a small burst length is degraded.
For example, an example of a data transmission operation when OAF is issued from the drives A and B in the storage system illustrated in FIG. 10, and the burst length of the drive A is shorter than the burst length of the drive B will be described with reference to FIG. 16.
FIG. 16 is a diagram illustrating an example of a data transmission operation in a storage system where drives A and B having different burst lengths are present together.
As illustrated in FIG. 16, a case where the BS of the drive A is 30 Kbytes, and the BS of the drive B is 150 Kbytes will be considered.
First, the drives A and B issue OAF within the OAF reception period of the expander 300-1. Since the AWTs of the received OAFs have the same value, the expander 300-1 transmits the OAF of the drive B, of which the SAS address is the greater, to the initiator 200, and connection is established between the drive B and the initiator 200 (see (i) in FIG. 16).
Subsequently, when the drive B transmits XFER_Rdy including the BS to the initiator 200, the initiator 200 transmits an amount of write data (150 Kbytes) corresponding to the data amount requested in the XFER_Rdy and closes the connection when the transmission is completed (see (ii) in FIG. 16).
Since the connection is closed, the expander 300-1 receives OAF again. In this case, the drive A issues OAF, in which A1 μs which is the wait period from the previous OAF is set to AWT, to the expander 300-1. Moreover, the drive B issues OAF in which 0 μs is set to AWT. The expander 300-1 transmits the OAF of the drive A, of which the value of the AWT is the greatest among a plurality of received OAFs, to the initiator 200, and connection is established between the drive A and the initiator 200 (see (iii) in FIG. 16).
Subsequently, when the drive A transmits XFER_Rdy to the initiator 200, the initiator 200 transmits an amount of write data (30 Kbytes) corresponding to the data amount requested in the XFER_Rdy and closes the connection when the transmission is completed (see (iv) in FIG. 16).
In the next OAF reception period, the drive A issues OAF, in which 0 μs is set to the AWT, to the expander 300-1. Moreover, the drive B issues OAF in which B1 μs which is the wait period from the previous OAF is set to the AWT. The expander 300-1 transmits the OAF of the drive B, of which the value of the AWT is the greater, to the initiator 200, and connection is established between the drive B and the initiator 200 (see (v) in FIG. 16).
Subsequently to (v) in FIG. 16, the drives A and B successively issue OAF. That is, the drive A that has issued the OAF of which the AWT is A2 μs acquires connection after the elapse of the next OAF reception period, and the drive B that has issued the OAF of which the AWT is B2 μs acquires connection after the elapse of the subsequent OAF reception period.
As described above, when drives having different BSs which are the sizes of data that can be transmitted in one connection are present together, the drive B can receive 150 Kbytes of data in one connection, whereas the drive A can receive only 30 Kbytes of data. Thus, as illustrated in FIG. 16, when all of the drives A and B successively issue OAF, the drive A has to wait until the drive B receives 150 Kbytes of data when receiving the next 30 Kbytes of data. Thus, there is a problem in that the performance of drives such as the drive A having a small BS is degraded greatly as compared to their original performance. This problem is more noticeable in high-performance drives with a small BS.
In FIG. 16, although a case where the expander 300-1 receives the OAFs from the subordinate drives A and B has been described as an example, the above-described problem also occurs when the expander 300-1 receives another OAF from the expander 300-2 and when the expander 300-2 receives the OAFs from the subordinate drives D to F.
Moreover, in FIG. 16, although an operation or writing data to the drive 400 has been described as an example, the above-described problem also occurs in a read operation and other operations when a plurality of drives 400 having different performance issue OAF to the initiator 200.
Without being limited to the case where drives having different BSs are present together, the problem in which the performance of high-performance drives is greatly degraded as compared to the original performance occurs in a case where drives with different performances are present due to various causes such as a write or read speed and a disc rotation speed.