Today's computing architectures are designed to provide the sophisticated computer user with increased Reliability, Availability, and Scalability (RAS). To that end, the rise of the Microsoft Windows NT/2000 operating environment has presented a relatively low cost solution to the traditional high-end computing environment. The introduction of the Enterprise Edition has extended the scalability and resilience of the NT Server to provide a powerful and attractive solution to today's largest and most mission critical applications.
The Cellular MultiProcessing (CMP) architecture is a software/hardware environment that is developing as the enabling architecture that allows the Windows NT/2000 based servers to perform in such mission critical solutions. The CMP architecture incorporates high performance Intel processors using special hardware and middleware components that build on standard interface components to expand the capabilities of the Microsoft Windows server operating systems. The CMP architecture utilizes a Symmetric MultiProcessor (SMP) design, which employs multiple processors supported by high throughput memory, Input/Output (IO) systems and supporting hardware elements to bring about the manageability and resilience required for enterprise class servers.
Key to the CMP architecture is its ability to provide multiple, independent partitions, each with their own physical resources and operating system. Partitioning requires the flexibility required to support various application environments with increased control and greater resilience. Multiple server applications can be integrated into a single platform with improved performance, superior integration and lower costs to manage.
The objectives of the CMP architecture are multifold and may consist at least of the following: 1.) to provide scaling of applications beyond what is normally possible when running Microsoft Windows server operating systems on an SMP system; 2.) to improve the performance, reliability and manageability of a multiple application node by consolidating them on a single, multi-partition system; 3.) to establish new levels of RAS for open servers in support of mission critical applications; and 4.) to provide new levels of interoperability between operating systems through advanced, shared memory techniques.
The concept of multiprocessors sharing the workload in a computer relies heavily on shared memory. True SMP requires each processor to have access to the same physical memory, generally through the same system bus. When all processors share a single image of the memory space, that memory is said to be coherent, where data retrieved by each processor from the same memory address is going to be the same. Coherence is threatened, however, by the widespread use of onboard, high speed cache memory. When a processor reads data from a system memory location, it stores that data in high speed cache. A successive read from the same system memory address results instead, in a read from the cache, in order to provide an improvement in access speed. Likewise, writes to the same system memory address results instead to writes to the cache, which ultimately leads to data incoherence. As each processor maintains its own copy of system level memory within its cache, subsequent data writes cause the memory in each cache to diverge.
A common method of solving the problem of memory coherence in SMP dedicated cache systems is through bus snooping. A processor monitors the address bus for memory addresses placed on it by other processors. If the memory address corresponds to an address whose contents were previously cached by any other processor, then the cache contents relating to that address are marked as a cache fault for all processors on the next read of that address, subsequently forcing a read of system memory. One major difficulty, however, in a multi-processor environment, is overloading the memory bus through the use of bus snooping, which results in a scalability limitation.
Another problem associated with SMP systems results from managing the many requests submitted to the common system bus shared by the multiple processors or their associated bus interface controllers. As the requests traverse their various stages of execution, the data used by the various execution stages was necessarily transferred to the particular data buffer utilized by that particular execution stage causing unnecessary data transfers to take place. The unnecessary data transfer directly correlating to increased time required to complete each request.
A need exists, therefore, to prevent the necessity of data transfer when the request traverses various stages of execution. Avoidance of unnecessary data transfer would result in decreased execution time per request and would decrease the amount of hardware and related software needed to manage such data transfers.