1. Field of the Invention
The present invention relates to a PCI.Express communication system and communication method thereof and in particular relates to improvements in the transmission protocol of the transaction layer packets (TLP) on error generation.
2. Description of the Related Art
A “PCI.Express” (registered trademark) bus is a type of high-speed serial interface using a point-to-point connection that has been developed in recent years for transmission of data in computer systems and other electronic devices. Such a serial interface bus occupies less area on the circuit board than conventional parallel transmission and so makes possible further miniaturization: its application is being studied in many fields.
The details of this standard are laid down as the PCI.Express base specification by the PCI-SIG (Peripheral Component Interconnect-Special Interest Group), which is the umbrella organization for establishing the PCI standard, and explanations of this standard have also been published. An example of technical documentation available in Japan is “Introduction to PCIe”, joint work by Nobutake Arai, Naoshi Satomi, and Akihiro Tanaka (published by Denpa Shinbunsha, 1 Apr. 2007, Chapters 1 to 5 (hereinbelow referred to as non-patent reference 1).
First of all, this PCI.Express communication system (sometimes referred to as PCIe communication system) will be outlined with reference to FIG. 2 to FIG. 3. As shown for example in FIG. 1 a PCI.Express communication system has the devices: a root complex 1, a switch 2 and endpoints 3 (3a, 3b, 3c and 3d).
Also, the root complex 1 and switch 2 respectively have a plurality of ports; the PCI.Express buses 7a to 7e that mutually connect these with the endpoints 3 have a three-layer construction, as shown in FIG. 2.
These layers transmit the data that is exchanged in the form of packets and respectively comprise: a transaction layer 101 that guarantees reliable communication of data in an end-to-end fashion with conventional PCI-compatible services in respect of upper-layer software comprising the drivers and application software of the uppermost layer, a data link layer 102 that guarantees reliable data communication between adjacent components and a physical layer 103 that exchanges communication packets on the physical medium.
In addition, the root complex 1 is located at the uppermost layer of the tree structure of the PCI.Express communication system and is respectively connected through the system bus (not indicated by a reference numeral in the drawing) with a CPU 5 and with a memory 6 through a memory bus (not indicated by a reference numeral in the drawing).
In this layout, in communication between the root complex 1 and the endpoint 3a, the switch 2 constitutes a TLP relay device, and, in communication between the endpoint 3a and the endpoint 3d, the switch 2 and the root complex 1 constitute relay devices.
The connection of the transmission paths between devices of a PCI.Express communication system constructed in this way is a point-to-point connection; in a dual simplex system employing two differential amplifiers in one direction, the link speed is 2.5 G bps, so that a bandwidth of 5 G bps is provided in both directions.
Furthermore, by increasing the number of sets of such bidirectional transmission paths (called lanes) from two to 32, the bus bandwidth can be made scalable, and data transmission can be performed by exchange of packets on these transmission paths.
As shown in FIG. 1, the packets generated in the transaction layer and data link layer are respectively called transaction layer packets (TLP) and data link layer packets (DLLP).
Physical layer packets (PLP) are also generated in the physical layer for link control purposes.
Also, the packets of each layer are exchanged between layers that are connected to the same target via a link and, as shown in FIG. 3, information is added at the beginning and at the end of a packet in the lower protocol layer, before being finally transmitted onto the transmission path (lane). The information at the beginning and end of a packet is deleted in each protocol layer when the packet is received, before the packet is handed over to the upper protocol layer.
In more detail, the TLP that performs end-to-end communication is constituted by a TLP header, data payload, an option TLP digest (called ECRC or end-to-end CRC (Cyclic Redundancy Code)) in the transaction layer; when these packets are received in the data link layer, a sequence number and LCRC are added, then, on reception, deleted after inspection.
DLLP are short packets used for exchange of information such as the response to transmission of a TLP (positive response Ack and negative response Nak) in both directions of a link.
In addition, control characters (STP and END) for detecting the beginning and end of a TLP are added at both ends of each TLP in the physical layer at the receiving end. Also, control characters (STP and END) for detecting the beginning and end of a DLLP are added at both ends of each DLLP.
Next, problems connected with error processing in the transaction layer on the occurrence of an error in the PCI.Express communication system constructed in this way will be described with reference to FIG. 4A to FIG. 4D and FIG. 5.
FIG. 4A to FIG. 4D are views given in explanation of problems that arise when errors are generated during transmission of transmission data from an endpoint 3a in a PCI.Express communication system comprising a root complex 1, switch 2 and endpoints 3a to 3c, and FIG. 5 is a view given in explanation of problems that arise regarding the fault tolerance function when errors are generated.
For example, as in the case of the endpoint 3 shown in FIG. 5, the layout of a PCIe device comprises a PCI.Express communication section 3a1, a local controller 3a2 that receives a request for transmission data from the PCI.Express communication section 3a1 and controls writing of the data to be transmitted to a data buffer (memory) 3a3 that stores the communicated transmission data, and the data buffer 3a3.
In order to establish the integrity of the transmission data transmitted from the PCI.Express device, usually, it is necessary to detect errors at the PCI.Express communication section 3a1 in respect of the data that is read from the data buffer 3a3 by the PCI.Express communication section 3a1. In this error detection, errors in the transmission data caused by for example software errors of the data buffer 3a3, or hardware faults in the upper layer circuitry of the PCI.Express communication section 3a1, such as the local controller 3a2, data buffer 3a3, and PCI.Express communication section 3a1 and the interface between the local controller 3a2 and data buffer 3a3 are detected.
First of all, data transmission in the transaction layer will be described. An EP bit is provided in the TLP header. Regarding the TLP transmission data, if for example the error detection circuit that is provided as part of the circuitry of the transaction layer of the PCI.Express communication section 381, as described above, detects an error, and this area is irrecoverable, the error detection circuit of the transaction layer transmits the TLP packet after setting 1 as the EP bit. By referencing this EP bit, the receiving end can tell that the received data contains an error: end-to-end data integrity can thus be guaranteed.
However, in order for the error detection circuit of the transaction layer at the transmission end to set 1 in the EP bit of the header, all of the transmission data must be temporarily accumulated in the data buffer, so the throughput of the PCI.Express communication system is lowered.
If the bandwidth of the bus of the upper layer circuitry of the PCI.Express described above is higher than the bandwidth of the PCI.Express, it is desirable that the circuitry of the transaction layer should transmit the transmission data that is transferred from the upper layer circuitry from the PCI.Express lane (transmission path) sequentially without temporary accumulation in the data buffer. Such a transmission mode is called “cut through”.
Next, a case where delay in error processing in a device adopting such a cut-through transmission mode presents a problem will be described with reference to FIG. 4A to FIG. 4D. FIG. 4A illustrates the condition where the endpoint 3b commences transmission of TLP2 of 1024 double words (hereinbelow abbreviated as DW) addressed to the root complex 1 during transmission from the endpoint 3a of a completion TLP1 with data attached addressed to the root complex 1, when an irrecoverable error is detected in the untransmitted data of TLP1 within the endpoint 3a. 
In this case, the endpoint 3a nullifies TLP1 by appending “EDB” (EnD Bad) instead of “END” as the control character at the tail of the TLP, as shown in FIG. 4B and then attempts to transmit to the root complex 1 an error message TLP3 indicating that this error is fatal.
Since TLP2 that was transmitted from the endpoint 3b is in a waiting condition in the buffer in the switch 2 until transmission of TLP1 has been completed, as shown in FIG. 4C, when transmission of TLP1 has been completed, the switch 2 transmits TLP2 to the root complex 1. But since TLP2 is now being transmitted, the error message TLP3 is kept waiting in the buffer of the switch 2.
Then, as shown in FIG. 4D, when transmission of TLP2 has been completed, transmission of TLP2 to the root complex 1 is commenced.
In the case described above, if the number of lanes at 2.5 G bps is 1, the problem arises that the error message TLP3 is delayed by an amount of, at the maximum, about 16 μsec (1024 DW×16 ns/DW). This delay may become even larger in the case of a system in which the switch 2 has a large number of endpoints connected thereto.
Also, there is the problem that, in cases where the endpoint 3c is constituted as a device that operates in a standby fashion with regard to the endpoint 3a, in cases where changeover of the device is triggered by this error message TLP3, such delay in the changeover time represents a period of malfunction of the system.
Next, the case where problems are experienced due to lowering of the fault tolerance function of the system when errors are generated, with reference to FIG. 5.
FIG. 5 is a layout diagram given in explanation of the error recovery operation of the endpoint 3a in the case where the root complex 1 transmits a 1024 DW memory read request TLP31 in respect of the endpoint 3a provided with a local controller 3a2, and the endpoint 3a detects an error in untransmitted data in the data buffer 3a3 during the course of transmission of the completion TLP 32 in respect of this request.
In the PCI.Express device, usually the following recovery operations are performed when cut-through is employed for data transfer of the transaction layer.
(1) On receipt of a memory read request TLP 31, the PCI Express communication section 3a1 of the endpoint 3a requests (s41) 1024 DW of data from the local controller 3a2.
(2) The local controller 3a2 transfers (s43) the data to the data buffer 3a3 and reports completion of data preparation (s42) to the PCI.Express communication section 3a1.
(3) The PCI.Express communication section 3a1 transmits a completion TLP 32 while performing burst read (s44) from the data buffer 3a3.
If, at this point, an error is detected in the data at the data buffer address CF4 h, the tail character “EDB” of the completion TLP 32 is added, so as to nullify the TLP 32 during transmission.
(4) Then, in order to perform error recovery of the data buffer 3a3, the PCI.Express communication section 3a1 again requests data from the local controller 3a2.
The problems associated with the error recovery operation of the data buffer 3a3 of the endpoint 3a will now be described separately for the case of a system designed under the assumption that the error processing response is to deal with a transient software error and for the case of a system designed under the assumption that the error processing response is to deal with an irrecoverable permanent fault.
In the former case, usually, the PCI.Express communication section 3a1 requests 1024 DW of data from the local controller 3a2 in the same way. In this case, there is the problem that, if the buffer address CF4 h is permanently faulty, writing the same data to the same region of the data buffer 3a2 will not result in recovery of the data error of the address CF4 h.
Also, since correct read data cannot be returned to the root complex 1, the PCI.Express communication section 3a1 transmits an error message of an irrecoverable error (Fatal Error) to the root complex 1.
Thus, on receiving this error message, the root complex 1 halts or resets the system.
In the latter case, the method is available of the PCI.Express communication section 3a1 requesting the local controller 3a2 to transmit the data with small data size, so that the completion TLP is transmitted to the root complex 1 by more than one transmission. This has the problem that a long time is required to complete transmission.
Another method is to duplicate the data buffer 3a3, using each data buffer alternately: however, this involves increased memory costs.
Incidentally, regarding this PCI.Express TLP, the technique has been disclosed of entrusting data error detection to LCRC of the data link layer function, utilizing the TLP digest field independently, without employing the ECRC of the transaction layer function. An example is US Patent Application Laid-open No. 2009/0006932 (hereinbelow referred to as patent reference 1).
In the PCI.Express standard, storage of the ECRC in the TLP digest is an option specification for guaranteeing end-to-end data integrity.
However, as stated in patent reference 1, if reliability of the relay device is sufficiently guaranteed and if a data error detection function, such as for example parity in the transmission and reception data buffer, is provided in the transaction layer, guarantee of end-to-end data integrity can be achieved by supplementation by the LCRC, so it may be concluded that the ECRC is unnecessary.
As an embodiment for employing the TLP digest independently in patent reference 1, the TD bit of the TLP header is used to indicate the presence or absence of a TLP digest and, by utilizing a reserve bit of the header, it is possible to indicate whether the TLP digest is being used to store independently-specifiable information or is being used to store the ECRC.
However, if a PCI.Express communication system is constructed in which the reserve bit is utilized with this objective, if, in future, the PCI.Express standard is revised so that a new definition is allocated to this reserve bit, it is possible that compatibility with the future PCI.Express standard will be lost.
It is therefore necessary to ensure that, when the TLP digest is used independently, the specification of independence of the TLP digest can be shared with PCIe devices in the system without needing to employ the reserve bit of the header.
As described above, in regard to TLP transactions based on the conventional PCI.Express specification, there are problems concerning time for fault recovery in the event of an error and concerning fault-tolerance.