Today's computing architectures are designed to provide the sophisticated computer user with increased Reliability, Availability, and Scalability (RAS). To that end, the rise of the Microsoft Windows NT/2000 operating environment has presented a relatively low cost solution to the traditional high-end computing environment. The introduction of the Enterprise Edition has extended the scalability and resilience of the NT Server to provide a powerful and attractive solution to today's largest and most mission critical applications.
The Cellular MultiProcessing (CMP) architecture is a software/hardware environment that is developing as the enabling architecture that allows the Windows NT/2000 based servers to perform in such mission critical solutions. The CMP architecture incorporates high performance processors using special hardware and middleware components that build on standard interface components to expand the capabilities of the Microsoft Windows server operating systems. The CMP architecture utilizes a Symmetric MultiProcessor (SMP) design, which employs multiple processors supported by high throughput memory, Input/Output (I/O) systems and supporting hardware elements to bring about the manageability and resilience required for enterprise class servers.
Key to the CMP architecture is its ability to provide multiple, independent partitions, each with their own physical resources and operating system. Partitioning requires the flexibility required to support various application environments with increased control and greater resilience. Multiple server applications can be integrated into a single platform with improved performance, superior integration and lower costs to manage.
The objectives of the CMP architecture are multifold and may consist at least of the following: 1.) to provide scaling of applications beyond what is normally possible when running Microsoft Windows server operating systems on an SMP system; 2.) to improve the performance, reliability and manageability of a multiple application node by consolidating them on a single, multi-partition system; 3.) to establish new levels of RAS for open servers in support of mission critical applications; and 4.) to provide new levels of interoperability between operating systems through advanced, shared memory techniques.
The concept of multiprocessors sharing the workload in a computer relies heavily on shared memory. True SMP requires each processor to have access to the same physical memory, generally through the same system bus. When all processors share a single image of the memory space, that memory is said to be coherent, where data retrieved by each processor from the same memory address is going to be the same. Coherence is threatened, however, by the widespread use of onboard, high speed cache memory. When a processor reads data from a system memory location, it stores that data in high speed cache. A successive read from the same system memory address results instead, in a read from the cache, in order to provide an improvement in access speed. Likewise, writes to the same system memory address results instead to writes to the cache, which ultimately leads to data incoherence. As each processor maintains its own copy of system level memory within its cache, subsequent data writes cause the memory in each cache to diverge.
A common method of solving the problem of memory coherence in SMP dedicated cache systems is through bus snooping. A processor performs bus snooping by monitoring the address bus for memory addresses placed on it by other processors. If the memory address corresponds to an address whose contents were previously cached by any other processor, then the cache contents relating to that address are marked as a cache fault for all processors on the next read of that address, subsequently forcing a read of system memory.
A common problem with SMP systems, however, is the excessive time commitment involved for bus transactions and bus transaction management when individual acknowledgments are required for each request and its associated response(s). In particular, data previously requested from a requesting agent may be deferred when, for example, a cache fault is detected. In such an instance, data access from system memory is required, which generally requires more time to execute than is required for retrieving information from the cache. The response to the request may therefore be deferred until such time that the data becomes available from system memory. Once the deferred data is available, a defer phase is entered, whereby the deferred data may be transferred to the requesting agent.
A problem with the defer phase, however, is that the deferring agent must gain ownership of the data bus each time deferred data is to be transferred, thereby delaying any other transfers that may be pending. The delay is further exacerbated when deferred data from multiple requests are pending. A complete defer cycle may be required for each deferred data response, resulting in cumulative bus idle cycles due to the bus handshaking that is required to resume normal bus activity. Thus, prior art methodologies that require bus ownership for each deferred data transfer increase the delay caused by interleaved defer phases.
Accordingly, a need exists to provide a method and apparatus that obviates the need to provide individual acknowledgments to similar transactions, such as defer phase data transfers, but rather allows streaming of similar transactions to reduce bus transaction time.