1. Field of the Invention
The present invention relates to the field of inter-chip communication.
2. Description of the Related Art
Current computing systems include a set of different chips, e.g., microprocessors, I/O chips, memory chips, and have a system wide control structure for the major configuration, control and recovery functions. Such computer systems are either using dedicated interfaces between the different chips for all communication that is related to these tasks or use special command types that are traveling through the system using the main data path or interfaces.
For coupling of mainframes with high speed interfaces such as InfiniBand, special redundancy features for synchronizing system times are needed. If a communication link between the coupling facility (CF), i.e. a communication means, and a system breaks, there can be several reasons such as a broken or unplugged cable, or a communication means went into a check stop status because of an internal error, and/or an entire system went down and has stopped the communication means.
For coupling software or communication software it is important to distinguish between the different cases. In particular, it is important to identify the case where the system stopped the communication means because the whole system went down.
From the point of view of the communication means, whether the system went down or if the system stopped the chip for any other reason may not be distinguishable. An exemplary implementation might be a mainframe system in which a dedicated error line embedded in the main communication interface from the root complex to the communication means is capable of stopping the communication means due to an internal error or if the whole system went down. If the information that the system error line was active can be communicated to the other end of the link, the system software there can correlate events from different links and draw the right conclusions for recovery of this situation.
With today's methodology, the error information can be transferred over the communication link with manufacturer special flow control packets (SFCP) defined by OpCodes (Operation Codes), which are not used by the standard interface protocol. These vendor specific packets can carry little payload for transferring data from one side to the other.
FIG. 1 shows a device for handling communication link problems between a first communication means 10 and a second communication means 20, in accordance with an embodiment of the prior art. The first communication means 10 includes a first control means 12 connected to a first interface means 14, and the second communication means 20 includes a second control means 22 connected to a second interface means 24. The first communication means 10 and the second communication means 20 are each part of a mainframe system 1, 2, wherein data signals and/or control signals and/or error information are transferred between the first communication means 10 and the second communication means 20 using the communication link 5 build between the first interface means 14 and the second interface means 24. In a memory means 16 the special flow control packets (SFCP) defined by OpCodes (Operation Codes), which are not used by the standard interface protocol, are stored. In normal operation, the connected first control means 12 feeds the first interface means 14 with a continuous sequence of data to be transferred, which in case of. a high speed serial interface as it is used for the InfiniBand or PCI express protocols are so-called ordered sets. These ordered sets are serialized and transferred over the communication link 5. In case of a communication problem the first control means 12 transfers corresponding special flow control packets (SFCP) from the memory means 16 to the first interface means 14 being used to send the corresponding error information to the second communication means 20. In the second communication means 20 the second control means 22 reports the error information to the error structure of the system.
A drawback of this approach is the fact that the chip clock signals Clk coupled to the first control means 12 can not be stopped immediately when the communication problem is occurring but must run some time longer until the special flow control packets (SFCP) are transferred from the first control means 12 to the first interface means 14 and further on over the communication link 5 to the second interface means 22 of the second communication means 20. This delayed clock stop results in debug data of less quality as the debug data is from a much later point in time than the point in time when the communication problem occurred.