There exists an ever-increasing need for more powerful computing platforms, e.g., servers, to meet the demands of modern transaction processing systems and Internet data providers. A variety of architectural technologies exist to meet such demands. Among these technologies are clustering and multiprocessing (e.g., symmetric multiprocessor (“SMP”) systems). Clusters are popular due to their low cost, reliability, and scalability; however, they are also associated with substantial overhead in system management and maintenance. SMP systems provide better performance and simplify system management and maintenance issues; however, due to technology limitations, SMP systems cannot scale beyond a limited number of processors. Both types of technologies are a part of mainstream computing technology. More recently, a third technology, referred to as cache-coherent nonuniform memory access (ccNUMA) architecture, has provided another approach to the problem of meeting increased processing demands. In particular, ccNUMA obviates the scalability limits of SMP systems while continuing to provide a single-system image that simplifies management and maintenance. A typical ccNUMA system design is implemented using several SMP “cells” that are connected via a cache-coherent switch, or “cross-bar”. The crossbar supports access to globally shared memory (“GSM”) across all processors in the system.
There are many advantages to ccNUMA systems, including scalability, ease of management, and reduced maintenance costs. Another advantage of ccNUMA systems is that they support partitioning of the system for purposes of containing failures, facilitating management, and isolating workload. Each such partition includes one or more cells and has a hardware “firewall” around it that prevents external agents from crashing the partition.
When the system architecture of a ccNUMA system is fixed, individual processors within cells can be made aware of other elements in the system through an available hardware architecture map. This hardware architecture map can be provided to the processor by inclusion in the read-only memory (ROM) of the processor. In this configuration, a processor accesses the hardware architecture map stored in ROM to determine which other system components are available and communicates accordingly. Additionally, each processor within a cell maintains a protection domain set (“PDS”), which identifies other cells within the same partition as the cell, and a coherency set (“CS”), which identifies other cells from which the processors within the cell can read. In general, a cell is operable to read from cells outside the PDS but inside the CS of the cell via GSM.
In multi-partition systems that include GSM, the crashing of one partition will typically slow down all accesses to that partition by another partition. If these delays are excessive, they may cause the accessing partition to crash. Clearly, this is an undesirable result.