Today's computing architectures are designed to provide the sophisticated computer user with increased Reliability, Availability, and Scalability (RAS). To that end, the rise of the Microsoft Windows NT/2000 operating environment has presented a relatively low cost solution to the traditional high-end computing environment. The introduction of the Enterprise Edition has extended the scalability and resilience of the NT Server to provide a powerful and attractive solution to today's largest and most mission critical applications.
The Cellular MultiProcessing (CMP) architecture is a software/hardware environment that is developing as the enabling architecture that allows the Windows NT/2000 based servers to perform in such mission critical solutions. The CMP architecture incorporates high performance Intel processors using special hardware and middleware components that build on standard interface components to expand the capabilities of the Microsoft Windows server operating systems. The CMP architecture utilizes a Symmetric MultiProcessor (SMP) design, which employs multiple processors supported by high throughput memory, Input/Output (IO) systems and supporting hardware elements to bring about the manageability and resilience required for enterprise class servers.
Key to the CMP architecture is its ability to provide multiple, independent partitions, each with their own physical resources and operating system. Partitioning requires the flexibility required to support various application environments with increased control and greater resilience. Multiple server applications can be integrated into a single platform with improved performance, superior integration and lower costs to manage.
The objectives of the CMP architecture are multifold and may consist at least of the following: 1) to provide scaling of applications beyond what is normally possible when running Microsoft Windows server operating systems on an SMP system; 2) to improve the performance, reliability and manageability of a multiple application node by consolidating them on a single, multi-partition system; 3) to establish new levels of RAS for open servers in support of mission critical applications; and 4) to provide new levels of interoperability between operating systems through advanced, shared memory techniques.
The concept of multiprocessors sharing the workload in a computer relies heavily on shared memory. True SMP requires each processor to have access to the same physical memory, generally through the same system bus. When all processors share a single image of the memory space, that memory is said to be coherent, where data retrieved by each processor from the same memory address is going to be the same. Coherence is threatened, however, by the widespread use of onboard, high speed cache memory. When a processor reads data from a system memory location, it stores that data in high speed cache. A successive read from the same system memory address results instead, in a read from the cache, in order to provide an improvement in access speed. Likewise, writes to the same system memory address results instead to writes to the cache, which ultimately leads to data incoherence. As each processor maintains its own copy of system level memory within its cache, subsequent data writes cause the memory in each cache to diverge.
A common method of solving the problem of memory coherence in SMP dedicated cache systems is through bus snooping. A processor monitors the address bus for memory addresses placed on it by other processors. If the memory address corresponds to an address whose contents were previously cached by any other processor, then the cache contents relating to that address are marked as a cache fault for all processors on the next read of that address, subsequently forcing a read of system memory. One major difficulty, however, in a multi-processor environment, is overloading the memory bus through the use of bus snooping, which results in a scalability limitation.
Another problem that exists within SMP designs is the lack of visibility to the system bus that is shared by each processor. Often, problems arise in the operation of the SMP, which may generally point to a problem with the system bus and/or the signals on the system bus. Lack of further visibility into the bus specifics, such as address dependent, data dependent, or function dependent system bus errors tend to complicate the trouble shooting process of the system bus errors.
Another problem that exists within SMP designs is that when history data is provided, no maskable operation is allowed which provides a user with the ability to dynamically select various trace modes and trace data formats to suit his particular needs.
A need exists, therefore, to provide a mechanism that allows a trace of system bus transactions to be stored for later retrieval and troubleshooting. Further, a need exists which allows a customization of the system bus trace such that error dependencies based on address, data or function may be easily determined.