1. Field of the Invention
The present invention relates to a direct memory access control technique and in particular to a technique for transferring information between random nodes.
2. Description of the Related Art
An effective data transfer between pieces of memories or between an input/output (I/O) device and memory, is a very important matter for improving the performance of a computer system; Direct Memory Access (DMA) is well known as a related technique for this purpose. The DMA, comprising a mechanism in which a DMA control circuit, which is specific use hardware, controls data transfers on the basis of an instruction from the CPU, substitutes for a central processing unit (CPU) in executing a control for transferring data within a computer system.
As an example technique related to the DMA controller, the technique disclosed by reference patent document 1 notes a configuration comprising a master processor (MP), a plurality of processor elements (PE), a DMA controller and the like; this configuration is capable of issuing a plurality of commands in a lump from the MP to the PE and respective DMA controllers, and is also capable of issuing the a subsequent command without waiting for a response to the previously issued command.
Reference patent document 2 discloses a communication control process apparatus comprising a CPU unit for performing a communication control process with a program (i.e., firmware) stored in a memory unit (i.e., an MEM unit) and a DMA unit for transferring data between a communication interface and the MEM unit through a direct memory access without executing a program (i.e., the CPU), wherein a monitor unit for monitoring a process of the firmware is equipped so that the DMA unit transfers data specified by the monitor unit to a trace data storage unit. Further, specifically, a plurality of process modules stored in a firmware storage unit are invested with the respective labels, and the individual process modules report the labels respectively invested to the module themselves to the monitor unit. Further, the monitor unit monitors by retaining the reported label(s), and controls the DMA unit so as to make it transfer, to the trace data storage unit, the detailed data of a process module corresponding to the label(s), if an occurrence of an abnormality is detected.
In addition, reference patent document 3, being related to an inter-processor data transfer control method for use in a register-combination multiprocessor system with the frame sizes being different between the transmission and reception sides, solves the problem of the necessity of executing a program different from a common data transfer process when the last piece of data remains in a register unit if data of the final frame of the transmission side is transferred to the register unit, the transmission side carries out the process of a data transfer completion by sending an END notification to the reception side and the reception side also carries out the process of a data transfer completion by sending an END notification.
According to the technique disclosed by patent document 3, in order to solve this problem, a direct memory access control unit on the transmission side determines, as a transmission completion signal to be added to an interrupt control unit, the logic product of a completion signal at the time the final data is transferred to the register unit and an acknowledge (ACK) is received from the direct memory access control unit on the reception side. The interrupt control unit adds an interrupt signal to the processor on the basis of the transmission completion signal.
In the meantime, a PCI_Express architecture has gathered attention in recent years as, for example, an I/O interface for connecting an input/output apparatus to a host apparatus.
Patent document 1: Laid-Open Japanese Patent Application Publication No. 2002-163239
Patent document 2: Laid-Open Japanese Patent Application Publication No. H07-93233
Patent document 3: Laid-Open Japanese Patent Application Publication No. H05-334260
Here, recent years have witnessed an adoption of the PCI_Express architecture in, for example, a RAID.
RAID is the abbreviation for Redundant Array of Inexpensive Disks or Redundant Array of Independent Disks and is a technique of storage redundancy for managing a plurality of storages (e.g., hard disks) by lumping them together into a single hard disk in order to speed up the process and improve security.
FIG. 8 exemplifies an information processing system including a RAID adopting the PCI_Express architecture.
In the information processing system shown in FIG. 8, a plurality of storage modules 50 constituting the RAID are interconnected by way of PCI_Express serial buses 70 and a PCI_Express switch 80. The PCI_Express switch 80 comprises individual ports (not shown in FIG. 8) for connecting the respective serial buses and performs a packet routing between random ports.
Each storage module 50 comprises a storage (e.g., a hard disk) 53 and a node 51 connected to the storage 53. The node 51 transmits and receives data to and from another storage module 50 by way of the PCI_Express serial bus 70 and PCI_Express switch 80.
Also, when receiving a discretionary request (e.g., a data update request) from a discretionary host computer 60, each storage module 50, being connected to the host computer 60 by way of a channel adaptor (CA) 52, performs a data update for the storage 53 of the module itself and also transfers data to a storage module 50 at, for example, a mirroring destination, thereby making the storage module 50 perform a data update, in accordance with the aforementioned request.
In the following description, a node receiving such a request from the host computer is called a local node, while a node of a data transfer destination such as, for example, a mirroring destination is called a remote node.
Each node comprises a CPU, local memory and a DMA controller (which are not shown in FIG. 8). The CPU is for executing a series of processes by using firmware which is internally built therein, and the firmware, receiving a request from, for example, the host computer, stores the received data temporarily in the local memory and also initiates the DMA controller, thereby requesting it for the data transfer process.
The firmware issues a descriptor when storing the data in the local memory. That is, when the CPU requests the DMA controller to transfer discretionary data, the CPU usually stores the data that is to be requested to be transferred among the local data under the management of the present CPU and also expands, in the local memory, the descriptor containing transfer control information (e.g., a transfer source address, a transfer destination address and transfer data size) to be utilized by the DMA controller. Then, the CPU initiates the DMA controller which, upon initiation, first reads the descriptor to analyze transfer control information, then reads the data stored in the transfer source address (i.e., a storage position of the data in the local memory) for the size of the data, and then transfers the readout data to the transfer destination node. The data also includes various messages in addition to the data to be transferred (e.g., the update data, which is noted as DATA hereinafter).
FIG. 9 is a sequence chart of transferring data between nodes in the configuration shown in FIG. 8.
To begin with, a transfer process of discretionary data transmits and receives a series of messages between the local and remote nodes before and after the process for actually transferring DATA, as shown in the data transfer sequence of FIG. 9. The firmware 101 of the local node stores the messages and descriptor in the local memory each time it makes a DMA controller 102 transmit a single message, thereby requesting the DMA controller 102 to transmit the present message. This prompts the DMA controller 102 to read and analyze the descriptor (of which the storing position is established), thereby transferring the messages to the remote nodes. This is the same on the remote node side. Also, the case of a DATA transmission is similar to the case of a message transmission.
A “cmd” message, shown in FIG. 9, is a request to the remote node for securing a resource (i.e., memory), while an “adr” message is a report of a secured memory address from the remote node. Upon finishing the message transmission and reception in a normal manner before a data transmission, a DATA transmission is carried out. Furthermore, a prescribed message is also exchanged after a DATA transmission. That is, a “done” message shown in the drawing is a report of a data transmission completion to the remote node, and a “cmp” message is a report of a data reception completion from the remote node.
Referring to FIG. 9, having received the “cmd” message, the DMA controller 112 of the remote node notifies the firmware 111 (i.e., the CPU) of the node itself of the reception of the “cmd” message (more specifically, the DMA controller 112 stores the “cmd” message in the local memory and interrupts the firmware 111) and responds back to the DMA controller 102 of the local node with a completion response. Having received the completion response, the DMA controller 102 of the local node notifies the firmware 101 of the node itself of the completion (i.e., interrupts). The firmware 101 of the local node is put into a state of waiting for an interrupt, and is thus unable to perform any other processes in the period from the above described DMA initiation to the receiving of the completion notification.
Meanwhile, having received the notification of receiving the “cmd” message from the DMA controller 112 of the remote node, the firmware 111 of the remote node completes an execution of a prescribed firmware process (i.e., a resource [i.e., memory] securing process in the above example) corresponding to the “cmd” message, followed by requesting the DMA controller 112 to transmit a response “adr” message responding to the “cmd” message. That is, the firmware 111 stores the “adr” message and descriptor in the local memory and initiates the DMA controller 112. The initiated DMA controller 112 transmits the “adr” message to the local node and, having received a completion response to the “adr” message, notifies the firmware 111 of the node itself of the completion (i.e., interrupts the firmware 111). The firmware 111 of the remote node is put into a state of waiting for an interrupt, and is thus unable to perform any other processes in the period from the above described DMA initiation to the receiving of the completion notification.
Also related to the transmission of the “done” message and “cmp” message, which are shown in FIG. 9, the firmware is put into a state of waiting for an interrupt, and is thus unable to perform any other processes in the period from the above described DMA initiation to the receiving of the completion notification.
As described above, either of the nodes is put into a state of waiting for an interrupt and thus is unable to perform any other processes in the period from the time of requesting the DMA controller to transmit a message (i.e., from the initiation of DMA) to the receiving of a completion notification (e.g., for a maximum of approximately 2 milliseconds), which has conventionally not constituted a problem, however.
That is, the above described time period of being in the state of waiting for an interrupt has not conventionally caused a bottleneck in terms of the performance of the apparatus because the time period before a completion response corresponding to the transmitted message is received (i.e., the time of being in the state of waiting for an interrupt) is exceeded by the time period before a CPU (i.e., firmware) becomes ready to transition to another task, in terms of the performance of the CPU, as shown in FIG. 10A.
In recent years, however, the time period before firmware becomes ready to transit to another task has been greatly shortened, as shown in FIG. 10B, due to an improvement in CPU performance, whereas the firmware is put into a state of waiting for an interrupt and is thus unable to perform another process (i.e., another task) until receiving a completion notification as described above, and therefore the time period of being in the state of waiting for an interrupt has become a bottle neck in terms of the performance of the apparatus.
This problem occurs not only in the case of using the PCI_Express architecture but also in a conventional serial interface and in a conventional parallel interface.
The problem for the present invention is to improve latency from the viewpoint of firmware, and in particular to provide a DMA controller, a program therefor, and such, enabling an execution of another process by reducing the time period of waiting for a response by notifying firmware of a simulated response instead of waiting for a response that depends on the type of a transmitted message when exchanging a message that is carried out before and after a data transmission.