As computer architectures continue to increase in processing power, the associated processing “real estate” continues to get smaller. While this is clearly beneficial, it creates new issues in the design, and particularly the testing, of these systems. In the earlier days of computing, designs were typically tested by manually probing access points in the system, which facilitated design verification and error discovery. The advent of semiconductor chips introduced smaller, more efficient systems, but testing still typically involved manual detection of problems. As software became more effective and popular, software compiling and debugging tools allowed computer programs to be more easily tested, yet hardware was still tested by analyzing signals at hardware access points. Test systems often included the use of dedicated connectors on printed circuit boards (PCBs), “bed-of-nails” test facilities, and the like. These systems, while useful in their day, are of diminishing value in the modern computer era, where in some cases computing systems can reside on a single chip. Even large-scale computers, while increasing in computational power, have been greatly reduced in size through the use of programmable logic devices, application-specific integrated circuits (ASICs), etc. Thus, while aggregating computational power into a small number of chips is highly beneficial to increase computer performance and marketability, it has made collecting information for purposes of testing, debugging and servicing increasingly challenging.
As integrated circuits continue to operate at faster speeds and have greater cell densities, it becomes more difficult to detect errors and capture information that assists in locating and identifying the errors. For example, as ASICs continue to become more densely populated, an increasing amount of the ASIC circuitry is embedded and unavailable for direct monitoring. Information from within the ASIC must somehow be captured and provided externally for analysis.
The information collected may be used for debugging problems found during the simulation and hardware checkout phases. Initial errors occurring during the design phase can be corrected with the help of an effective debugging mechanism. The information may also be used for analyzing problems reported from customer sites, as it is imperative that error discovery and analysis be provided in order to service customers without having to replace an entire system. The performance of the particular computer system under analysis can also be gauged by collecting and monitoring information generated by an operational computer system.
These issues are exacerbated in the context of multi-processor systems, and particularly those that employ cache or other memory that is shared between these processors or other agents coupled to the system bus. More particularly, some computing systems require that each processor or agent have access to the same physical memory, generally through the same system bus. When all processors share a single image of the memory space, that memory is said to be coherent, where data retrieved by each processor from the same memory address will be the same data. Coherence becomes more difficult through the use of high speed cache or other similar memory. For example, when a processor reads data from a system memory location, it may store that data in a cache memory. A successive read operation from the same system memory address results instead, in a read from the cache, in order to provide an improvement in access speed. Likewise, write operations to the same system memory address may result in a write operation to the cache, which can lead to data incoherence if not properly managed. As each processor maintains its own copy of system level memory within its cache, subsequent data writes cause the memory in each cache to diverge.
Cache coherency protocols ensure that one processor's cached copy of a shared memory location is invalidated when another processor writes to that location. Thus, each processor in a multi-processor system is typically responsible for snooping the system bus to maintain currency of its own cache. Coherent memory requests may communicate the memory accessed by one processor to the other processors on the bus through the use of bus “snooping” functions, so that stale data is not used. For example, when a cache update is performed on shared cache data, it can be announced over or otherwise ascertainable via the bus. Each processor monitors or “snoops” the bus to perceive this information, and reacts accordingly. If a particular processor has a copy of the cache line that is being requested by another processor or agent, it may have to surrender its exclusive copy of the data, change the state of its shared copy, etc.
During a bus snoop phase, the various processors snoop the bus and provide snoop results that correspond to the address associated with the request. These snoop results may provide status of the cache line(s) associated with the respective processor, and indicate whether the transaction has completed. For example, if a processor cannot determine the status of its associated cache or is otherwise unable to determine whether a transaction will be completed, valid snoop results for that processor will not be provided. In these cases, the snoop results may be delayed until a later time, in terms of clock cycles. These delays may occur many times until valid snoop results are provided.
It is important to be able to debug, troubleshoot or otherwise analyze the system using the snoop results. However, because information is available on the bus during the potentially many snoop delays, the snoop data is difficult to analyze as it is riddled with data associated with such snoop delays. Further, such irrelevant data could flood a memory device with irrelevant information, to the exclusion of needed data, thereby requiring larger memories or risking loss of relevant data.
Accordingly, there is a need for an effective debugging and analyzing system and method that will allow for the selective capture and recording of only relevant information relating to cache or other memory coherency operations. The present invention fulfills these and other needs, and offers other advantages over prior art approaches.