When an error occurs on one node of a Scalable Coherent Interface (SCI) system, it can cause other errors which quickly propagate through the complete system, giving rise to a number of error signals at different nodes, all perhaps stemming from the original error condition. Each of these errors must be cleared before the system is back to full health. However, it can be very difficult to determine which of the many error signals represent the original error and which error signals are derivative therefrom.
An example of this situation is a simple timeout at a particular node. The local node detects the timeout error and logs it. In the meantime, a remote node could also be attempting to access that same memory location. The remote node then also will log a timeout error. From the perspective of the remote node, it is not easy to know if the memory at the target node is not working properly or if the linkage connecting the remote node to the target memory is not functioning properly.
The most important thing in debugging this particular error would be to know that the local error was logged first.
This sequencing of errors is not currently possible since the errors are logged without time information being associated therewith. The error is just a bit logged in a certain location at a node. Using a time stamp for each error would be very difficult because it would involve synchronizing different clocks and different nodes to a very high accuracy, perhaps even down to nanoseconds. The overhead involved with such a system would be prohibitive.
Thus a need exists in the art for a system and method for isolating errors that occur at one node of a multi-node system but which can cause error conditions to be logged at multiple other nodes.
A further need in the art exists for such a system which does not significantly increase the overhead with respect to such logged errors.
A still further need exists in the art for establishing a system and method for determining the order of occurrence of errors which can occur at multiple nodes as a result of an error condition at one of the nodes.