Today's computing architectures are designed to provide the sophisticated computer user with increased Reliability, Availability, and Scalability (RAS). To that end, the rise of the Microsoft Windows NT/2000 operating environment has presented a relatively low cost solution to the traditional high-end computing environment. The introduction of the Enterprise Edition has extended the scalability and resilience of the NT Server to provide a powerful and attractive solution to today's largest and most mission critical applications.
The Cellular MultiProcessing (CMP) architecture is a software/hardware environment that is developing as the enabling architecture that allows the Windows NT/2000 based servers to perform in such mission critical solutions. The CMP architecture incorporates high performance Intel processors using special hardware and middleware components that build on standard interface components to expand the capabilities of the Microsoft Windows server operating systems. The CMP architecture utilizes a Symmetric MultiProcessor (SMP) design, which employs multiple processors supported by high throughput memory, Input/Output (IO) systems and supporting hardware elements to bring about the manageability and resilience required for enterprise class servers.
Key to the CMP architecture is its ability to provide multiple, independent partitions, each with their own physical resources and operating system. Partitioning requires the flexibility required to support various application environments with increased control and greater resilience. Multiple server applications can be integrated into a single platform with improved performance, superior integration and lower costs to manage.
The objectives of the CMP architecture are multifold and may consist at least of the following: 1.) to provide scaling of applications beyond what is normally possible when running Microsoft Windows server operating systems on an SMP system; 2.) to improve the performance, reliability and manageability of a multiple application node by consolidating them on a single, multi-partition system; 3.) to establish new levels of RAS for open servers in support of mission critical applications; and 4.) to provide new levels of interoperability between operating systems through advanced, shared memory techniques.
The concept of multiprocessors sharing the workload in a computer relies heavily on shared memory. True SMP requires each processor to have access to the same physical memory, generally through the same system bus. When all processors share a single image of the memory space, that memory is said to be coherent, where data retrieved by each processor from the same memory address is going to be the same. Coherence is threatened, however, by the widespread use of onboard, high speed cache memory. When a processor reads data from a system memory location, it stores that data in high speed cache. A successive read from the same system memory address results instead, in a read from the cache, in order to provide an improvement in access speed. Likewise, writes to the same system memory address results instead to writes to the cache, which ultimately leads to data incoherence. As each processor maintains its own copy of system level memory within its cache, subsequent data writes cause the memory in each cache to diverge.
A common method of solving the problem of memory coherence in SMP dedicated cache systems is through bus snooping. A processor monitors the address bus for memory addresses placed on it by other processors. If the memory address corresponds to an address whose contents were previously cached by any other processor, then the cache contents relating to that address are marked as a cache fault for all processors on the next read of that address, subsequently forcing a read of system memory. One major difficulty, however, in a multi-processor environment, is overloading the memory bus through the use of bus snooping, which results in a scalability limitation.
Another problem exhibited by SMP systems, is multiple processors often may request bus reads from the same cache line. Whether or not the cache reads result in a cache hit or miss, two separate responses are required to be generated for each bus read request received. This condition results in the production of redundant bus transactions within an outgoing bus request queue, thus needlessly occupying precious space within the queue for redundant transactions. A need exists, therefore, for a mechanism within the SMP system that links bus requests to the same cache line and then reduces the number of queued response requests by spawning multiple bus requests from the single linked request.