Modern computer systems are designed with heterogeneous processing units that perform data processing operations on data values stored in memory. One example of such a system includes addressable memory region of DRAM; and one or more small addressable memory regions. To access a particular data value, a processing unit implements a request address bus that designates the memory location to be accessed.
Processing units may communicate with other processing units and memory through a transport mechanism. In such a system, addresses may be transmitted between units via buses in the transport mechanism and may be stored in transaction tables. If the system contains cache coherent processing units, addresses may also be stored in cache tags. Many processing units and other interconnect agents implement directories. A directory is used to track which agents or processors in the system share data. For every agent that is tracked, there is a tracking bit needed in the tag line of the directory. Thus, as the number of agents grows, the directory size for tracking the information grows exponentially.
Storing full addresses, especially in structures such as cache tags, uses a significant amount of silicon area, which drives manufacturing cost, and transmitting full addresses requires additional wires that further increases silicon area. In addition, operating on full addresses requires significant logic gate delay that limits clock speed and system performance.
Occasionally errors occur in the address information for a system directory. For example, a directory entry may have an error, which may be an uncorrectable error. Whenever an uncorrectable error is detected at a directory entry, while there is a directory lookup, typical implementations disable directory accesses to preserve system coherency. After this point, all accesses to the system directory or directory segment behave as null directory. This causes the system to send snoops to all agents associated with the system directory. This results in an overall degradation of system performance.
Therefore, what is needed is a system and method that allows recovery of the system directory in the situation where an uncorrectable error is detected, while avoiding a drop in or degrading the system performance.