Some computer systems have many chips working under the control of one or more processors. Hardware problems or faults suffered by the chips and/or observed by the chips and reported to the processors often manifest themselves almost simultaneously. An initial hardware fault may trigger multiple error reports which are transmitted to the system processor. The multiplicity of these reports from a single triggering event may make diagnosis of the problem causing the initial error difficult in that it is often problematic to reconstruct which error occurred first among multiple reported errors.
Determining the time of the occurrence of the errors is difficult since chips working under the control of one or more processors frequently have local time counters which are not synchronized. The local time counters may increment with every clock tick (e.g. every 16 nanoseconds or however fast the clock is in the electronic device). Even when two local chips are both using counters that increment on the clock tick however, the values of the local time counters may be different since they may have started from a different baseline. Since the chips each have their local time counters operating independently, comparison of the different local time counters for the purposes of identifying the first event in a string of events is frequently quite difficult. Furthermore, propagation times of errors from the chips to the operating system may not be uniform for all chips, resulting in inaccurate assignment of error times to errors.
Conventionally, computer systems have generated hardware faults and reported them to controlling processors. The controlling processor can accumulate the reports of hardware errors and present them to a human user. Unfortunately without some way of determining which error occurred first, the diagnosis of the initial cause of the fault is exceedingly difficult.