Upon receiving a data write request from a host, a storage device notifies the host that the data has been received properly and writes the received data to a disk. If the data has not been received properly, the storage device notifies the host that the data has not been received properly and then receives the same data again from the host. The host that has received from the storage device the notification, notifying that the data has been properly received, erases the held data and uses the resources, used for holding the data, for different work. Therefore, the storage device is requested to notify, as quickly as possible, the host that the reception of the data has been completed and also requested to properly write the data.
As a result, the storage device is designed to be able to quickly notify the host that the reception of data has been completed. For example, the storage device writes the received data once in a memory upon receiving the data from the host. Then, the storage device notifies the host that the reception has been completed before actually writing the data to the disk. Thus storage devices are often equipped with a memory having an error detecting and correcting function with which data corruption is detected by the memory and corrected. The error detecting and correcting function may be based on an error-correcting code (ECC) for example.
Japanese Laid-Open Patent Publications No. H09-288619 and No. 2008-158779 disclose related techniques.
Problems
However, there is a problem in that the data to be written may be lost if the memory and the memory controller are not equipped with an error detecting and correcting function.
The problem of the loss of the data to be written is described in detail with reference to FIGS. 8 and 9. In FIGS. 8 and 9, the memory is described as a dual inline memory module (DIMM) and the error detecting and correcting function is described as ECC. FIG. 8 describes a process of writing to a disk when the DIMM is ECC-compatible. FIG. 9 describes a process of writing to a disk when the DIMM is not ECC-compatible.
As illustrated in FIG. 8, a storage device 900 includes controller modules (CMs) CM#0 to CM#3. CM#0 and CM#1 are redundant, CM#2 and CM#3 are redundant, and CM#0 to CM#3 are interconnected through a peripheral component interconnect express (PCIe) switch. CM#0 and CM#3 are each connected to a host. CM#0 to CM#3 each has a central processing unit (CPU) 910, DIMMs 920, and a memory controller 930. Furthermore, CM#0 to CM#3 each has channel adapters (CAs) 940, disk interfaces (DIs) 950, direct memory access (DMA) chips 960, and a PCIe switch 970. The DIMMs 920 are ECC-compatible memory modules. The memory controller 930 is an ECC-compatible controller and controls the DIMMs 920. The CAs 940 are interfaces with a host. The DIs 950 are interfaces with a disk. The DMA chips 960 are DMA interfaces between the CMs. DMA is a data transfer protocol for directly transferring data without involving the CPU 910. The PCIe switch 970 connects the memory controller 930, the CAs 940, the DIs 950, and the DMA chips 960 as input/output (I/O) devices.
When the storage device 900 has received, from a host, a write request to a disk area managed by CM#0 and CM#1, the writing process is conducted as described below. First, a CA 940 in CM#3 that has received the data write request from the host transfers the data to be written to a DIMM 920 using a DMA function and expands the transferred data in the DIMM 920 (O31 in FIG. 8). If an error detecting code, for example, a cyclic redundancy check (CRC), has not been added to the data at this time, the CA 940 then transfers the data to the DIMM 920 after adding an error detecting code to the data. Even if data corruption is detected, the data corruption is automatically corrected by the DIMM 920 since the DIMM 920 is ECC-compatible.
Then, when the CA 940 notifies the CPU 910 that the data transfer has been completed, the CPU 910 notifies the host that reception of the data has been completed (O32 in FIG. 8). The data to be written is then erased on the host side.
The CPU 910 then requests a DMA chip 960 to transfer the data to CM#0 and CM#1 managing the disk. The DMA chip 960 transfers the data to DIMMs 920 in CM#0 and CM#1 and expands the data in the respective DIMMs 920 (O33 in FIG. 8). Even if data corruption is detected, the data corruption is automatically corrected by the DIMM 920 since the DIMM 920 is ECC-compatible.
A DI 950 then reads the data from the DIMM 920, checks for errors in the read data, and writes the data to the disk if there are no errors. Even if there is an error, the DI 950 is able to write the correct data to the disk by reading the data again from the DIMM 920 (O34 in FIG. 8).
A process of writing to a disk when the DIMM is not ECC-compatible will be described with reference to FIG. 9. DIMMs 920A are non-ECC-compatible memory modules. As illustrated in FIG. 9, after a data write request is output by a host, a CA 940 in CM#3 first transfers the data to be written to a DIMM 920A using a DMA function and expands the transferred data in the DIMM 920A (O31 in FIG. 9). If an error detecting code, a CRC for example, has not been added to the data, the CA 940 then transfers the data to the DIMM 920A after adding an error detecting code to the data. Even if data corruption is detected, the data corruption is not automatically corrected since the DIMM 920A is not ECC-compatible and thus the data remains corrupted.
Then, when the CA 940 notifies the CPU 910 that the data transfer has been completed, the CPU 910 notifies the host that reception of the data has been completed (O32 in FIG. 9). The data to be written is then erased on the host side.
The CPU 910 then requests a DMA chip 960 to transfer the data to CM#0 and CM#1 managing the disk. The DMA chip 960 transfers the data to the DIMMs 920A in CM#0 and CM#1 and expands the data in the respective DIMMs 920A (O33 in FIG. 9). Even if data corruption is detected, the data corruption is not automatically corrected since the DIMM 920A is not ECC-compatible.
A DI 950 then reads the data from the DIMM 920A, checks for errors in the read data, and writes the data to the disk if there are no errors (O34 in FIG. 9). However, when there is an error, an error may occur again in the check even if the DI 950 reads the corrupted data again from the DIMM 920A in CM#0. Similarly, an error may occur again in the check even if the DI 950 reads the corrupted data again from the DIMM 920A in CM#3. Consequently, the data is lost since the data has already been erased on the host side.