Existing internet data centers (IDCs) pack hundreds of processors (e.g., servers and the like) in a single building for processing a large volume of data transactions. Generally, the compute density or number of nodes per volume defines the efficiency of the IDC. The compute density effects the amortization of the high cost of the IDC infrastructure (e.g., networking, power, cooling, maintenance, reliability, and availability support). Typically, the greater the compute density, the better the IDC will be able to amortize the high cost of IDC infrastructure. Accordingly, a large compute density may be preferred. However, space may be unavailable or costly for locating a large number of processors necessary for maintaining a large compute density.
To provide increased compute density, multiprocessing schemes that utilize multiple processors have been developed. One conventional multiprocessing scheme (shown in FIG. 1) includes a computer system 100 having multiple processors 10-40, each on a separate die (i.e., chip) 50-80, and connected to a single operating system 90 stored in a memory 95. The system 100 may conserve space if the system is provided in a single housing.
A second conventional multiprocessing scheme (shown in FIG. 2) includes a computer system 200 having a chip multiprocessor 295. The chip multiprocessor includes multiple processors 210-240 on a single die 290. Similar to the processors 10-40 in system 100, the processors 210-240 are connected to a single operating system 250 stored in a memory 260. The processors 210-240 may communicate with the memory 260 via a bus 270. System 200 conserves space by providing multiple processors on a single die. However, the systems 100 and 200 suffer from well known scalability problems.
Schemes that have placed multiple processors on a single chip typically utilize a single operating system for tying all the processors together. A well known limitation of this scheme and other multiprocessing schemes utilizing a single operating system is that an operating system does not scale well to large numbers of processors. That is, as the number of processors managed by a single operating system increases, the efficiency of the operating system goes down dramatically. For example, an operating system typically includes internal data structures that may be limited in the number of processors that can be supported, and limited bandwidth on a bus may slow transactions. Thus, scaling becomes impractical above some small number (e.g., currently about four to at most 64 processors, depending on the operating system in question).
Bugnion et al., in U.S. Pat. No. 6,075,938, discloses using a cache coherent non-uniform memory architecture (CC-NUMA) that supports multiple processors executing multiple operating systems. However, Bugnion et al. discloses multiple virtual processors, implemented in software on a single physical processor. This architecture fails to provide multiple physical processors, implemented in hardware, on a single die. Accordingly, this architecture suffers a performance penalty, because the single physical processor must task switch among multiple virtual processors (only one virtual processor can be running on the physical processor at any given time). Moreover, if this architecture were to support multiple physical processors, it would need space for providing multiple dies, and processing speed would consequently be sacrificed due to the input/output procedures needed to communicate among the multiple separate processors and the memory.