A computer system may include a plurality of bus units (e.g., logical units such as microprocessors, memory management processors, input/output (I/O) processors and/or the like), coupled via one or more buses, that may require access to one or more memories of the system. For example, the system may include a hierarchy of bus units. More specifically, the system may include a first group of bus units in a first chip and a second group of bus units in a second chip. Further, the first and second chips may be on the same or different cards of the system.
During operation, one of the bus units may issue a pending coherent command on a bus. The pending command may require access to an address (e.g., cacheline) included in a memory of the system. In a conventional system, to maintain coherence, the system requires each of the remaining bus units of the system to respond to the issuing bus unit to indicate whether the bus unit locally stores the cacheline, and if so, the state of such a locally-stored cacheline. However, due to the hierarchy of the bus units, a response from one or more of the remaining bus units to the issuing bus unit may take a long time, and therefore, increase command latency. For example, assuming the first and second chips are on the same card, if the issuing bus unit is in the first chip, respective responses from the bus units in the second chip may require a long time. If the first and second chips are on different cards, respective responses from the bus units in the second chip may require an even longer time. Accordingly, improved methods and apparatus for reducing command processing latency while maintaining coherence are desired.