This invention relates to the field of shared memory based multiprocessor systems. More specifically, the invention relates to an apparatus that is capable of partitioning a shared memory based multiprocessor system into independent, fault contained computing domains.
Modern computer systems are increasingly comprised of shared memory based multiprocessor systems (SMP). At the same time computing has witnessed a sheer outburst in different types of applications, from user oriented desktop applications, such as word processing, to more enterprise oriented tasks such as web servers, databases and electronic mail services. Each type of such applications can carry significantly different importance and criticality as well as technical maturity. It, therefore, makes sense to group applications with similar reliability, availability and serviceability (RAS) requirements into an independent domain and ensure that faults are contained within that domain, i.e. fault in one domain do not affect applications executing in another domain. Traditionally, such domains were located in different physically separate computing systems, each executing its own distinct operating system image. With the availability of shared memory based multiprocessors systems, the necessity of assigning domains to physically separate computing systems seems to vanish. Instead it is desirable to locate several domains onto the same shared memory multiprocessor and have them share the resources. In order to present to each domain the illusion of an isolated dedicated machine as well as for reasons of fault containment, the resources of the shared-memory based multiprocessor system must then be partitioned among the several operating systems executing on these partitions.
Shown in FIG. 1 is the general architecture of a shared memory based multiprocessor system in its most common architecture, a symmetric multiprocessor system (SMP). The backbone of the system is the system but (100) to which a set of CPUs (101) is attached. Also attached to the system bus is the memory controller (110) which interfaces the system to the memory subsystem (111). Furthermore, a set of I/O controllers (102) is attached to the system bus. The system controllers snoop for I/O requests on the bus and forward the request to the I/O subsystem (120, 130) which is attached to its associated I/O bus (121, [130] 131) attached to which are the various I/O devices (122, 123, 132, 133) serviced by the system.
In principle, one can identify four classes of resources: (a) memory space, (b) I/O space, (c) interrupts and (d) CPUs. Due to reasons described below, it is very difficult to partition a commodity based shared memory based multiprocessor without modification of hardware and system software.
Simple memory partitioning requires that the physical memory range be separated into several memory partitions which are assigned to the various partitions. A single memory partition does not necessarily have to be contiguous. It is even conceivable that there are memory ranges that are shared among partitions, for instance, for the implementation of communication channels that carry additional protocols such as TCP/IP (Transmission Control Protocol/Internet Protocol) or VIA (Virtual Interface Architecture). However, this model still relies on each operating system (OS) or its applications to address only the physical memory that was assigned to its partition. For example, a malicious or a faulty OS can corrupt another partition by gaining access to the other partition""s memory space. This can happen in two different ways:
The operating system executes in real non-translated mode and addresses memory that has not been assigned to its partition.
An application""s translation table (i.e. page table), as prepared by the OS is corrupted and refers to memory that has not been assigned to its partition.
Both of these cases must be prevented to ensure proper, secure, and fault contained partitioning of shared memory multiprocessors. In particular, a method is required that:
Restricts processors executing as part of a particular partition to access only the physical memory in that partition""s memory.
Dependent of the possibility of changes to the operating system, various solutions to this problem known in the prior art are possible. When OS kernel changes are possible, memory access problems can be limited, but not fully eliminated. For instance, code controlling the updates to the translation tables can be very carefully crafted and augmented with verification checks. In particular, changes to the translation tables can be required to be in real-mode and the pages backing the translation tables are never mapped anywhere in any translation table themselves. This can be accomplished by introducing a special privileged processor mode, to update the page tables. This avoids accidental wild writes by an OS that otherwise operates in a translated mode. Alternatively, one can require that all updates to the translation table must be made in a translated mode, and all pages backing the page tables have their write protection enabled or are write protected in by the memory controller. Then, the general protection fault that accompanies a translation table update can be analyzed, verified, and emulated. Unfortunately, wild writes can still cause a problem: in the case where special purpose registers are used to point to the current address table (e.g. Intel-IA32, Dec-Alpha) one cannot catch illegally generated xe2x80x9cupdate-translation-tablexe2x80x9d instructions, which point to translation tables outside the designated translation table memory range. In contrast, when the OS kernel(s) can not be modified, it is possible to execute the operating system at a lower privileged level, then trap on privileged instructions and emulate them. In particular, updates to translation tables must be verified. However, this case still requires that one trusts the emulation code. Furthermore, this solution comes at a price, namely, the emulation of privileged instructions can introduce a runtime overhead. Even worse, licensing issues of commodity operating systems often prohibit such deployment.
Neither of the above solutions for secure memory partitioning is appropriate when commodity processors, commodity memory and commodity operating systems are used, which is the case for a large quantity of today""s computing systems. For instance, many operating systems assume so called xe2x80x9c0-basedxe2x80x9d real-memory, where real memory is defined to be the range of memory addresses generated by the general translation mechanism of the processor architecture. In an un-partitioned system, one assumes that real addresses equal physical addresses, i.e., the addresses with which the memory is actually fetched. To fulfill the xe2x80x9c0-based real memoryxe2x80x9d model (as required by many commodity operating systems) in a partitioned system the real-memory addresses cannot be semantically equal to the physical memory addresses. Hence, an additional mapping from real to physical addressing is required. Though the remapping idea is not new by itself (PowerPC), it had been typically provided as a part of the processor core, so the address that is pushed onto the memory bus is already translated into physical addresses. What is needed in a system based on commodity processor technology, is a method for remapping real addresses outside the processor core. Unfortunately, due to very tight timing constraints that govern modern system buses, such remapping devices can not be located between the processor and the bus but must be located close to or within the memory controller. The issuing processor-ID, which can be identified by the bus grant signal, is used to select the correct partition based remapping.
However, placing a remapping device with the memory controller, rather than the processor core, creates a cache coherence problem, due to the fact that two partitions can put the same real memory address out on the system bus, yet for different partitions, the same real address refers to different memory. In more detail, each processor typically snoops on the bus as part of its cache coherence protocol. If two processors P1 put real address (A) onto the memory bus, processor P2 though it belongs to different partition, still provides to P1 its cached content of (A), rather than allowing the request to be filled by the memory controller which follows the proper remapping. However, providing one""s cache content to a processor in a different partition violates the memory consistency model and leads to faulty behavior since the real address (A) in two different partitions is not backed by the same memory. Note, that performing the translation from real into physical in the processor core eliminates this cache coherency problem. Nevertheless, when utilizing commodity based microprocessors, besides physical memory separation, establishment of proper distinct cache coherence domains poses another significant problem in memory partitioning.
In the I/O space, any of the simplistic memory protection solutions discussed above that are based on translation table protection can be circumvented. For instance, a DMA capable device can be instructed to write to a particular physical address in memory. This has to be prohibited; otherwise, malicious applications or a faulty OS could utilize a DMA engine to corrupt memory in different partitions. One of the approaches taken in prior art is to partition the I/O at natural boundaries such as I/O controller (a.k.a I/O bridges or I/O buses), This requires, that these controllers only listen when I/O or memory mapped I/O requests are issued by a processor in their assigned partition. Similarly, when DMA is initiated by a device of a particular I/O controller, the memory controller must know to which partition the device belongs. For instance, the I/O controller could have its own bus master id, which like the processor id can be used to select the proper remapping in the memory controller. In contrast, in the case that the real to physical translation takes place on the processor, the I/O controllers listen to physical addresses; however, all DMA requests still embody a real address. When the DMA is started, the real address must be remapped in the IO controller, thus replicating the processor remapping functionality.
I/O devices generate physical interrupts that are typically captured by an interrupt controller. The interrupt is then sent dependent on possible setups to a particular CPU (e.g. based on current priority settings) or towards all CPUs. It is obvious that interrupts must be contained with in their domain. If they are not, device interrupts received at a different partition might not even have a handler registered for the interrupt and hence a general machine check would result. In the more favorable case that the interrupt can be identified as belonging to another partition, the interrupt must be forwarded to the proper destination. Other situations that must be avoided are where an OS picking up a general protection fault broadcasts a reset interrupt to all CPUs spanning the machine. Instead CPUs and busmasters must be grouped according to their domain. A system can be implemented providing this kind of functionality by intercepting all interrupts and directing them to the proper partition. This again requires as in the memory partitioning discussion the ability to raise the priority level of the OS and to emulate a privileged instruction which is often not possible. Similar to the I/O problem, interrupt domains can be accomplished by replicating interrupt controllers for each I/O entity (e.g. I/O controller) that can be assigned to a partition.
As above discussions have illustrated, providing a fault contained, secure partitioning of shared memory based multiprocessor systems with techniques known in the prior art is quite cumbersome and basically impacts every system interface component that attaches a commodity item (e.g. CPU, IO device, Memory Controller) to the system bus. This ultimately eliminates the cost benefits of utilizing commodity items.
It is an object of this invention to provide an apparatus and method for flexible and secure partitioning of shared memory based multiprocessor systems utilizing commodity hardware such as central processing units, caches, memory banks, IO controllers, IO devices and interrupt controllers.
In order to achieve flexible and secure partitioning of shared memory based multiprocessors that utilize commodity hardware, the present system and method in its preferred embodiment is configured out of the following components:
(a) a set of internal system buses, replicating the entire standard system bus, and
(b) a configurable crossbar switch that links together bus attached components to particular internal system buses, and
(c) a memory controller interface, one attached to each internal system bus, which provides the real-to-physical address remapping for the partition defined by the internal system bus in case memory is accessed.
Rather than providing a single system bus, several complete internal system buses are provided. Each said internal system bus provides all bus signals including address, data, and control signals. The number of internal system buses determines the number of independent partitions that can be established. The system bus components are part of a cross bar switch. In addition to the system buses, the cross bar switch provides a set of external connector buses. Single system components such as CPUs, I/O controllers, interrupt controllers, but not the memory controllers, can be attached to the external connector bus. Within the crossbar switch, the external connector buses can be coupled with the internal system buses. This coupling is setup via a partitioning control unit, also known as the partitioning control unit or coupling control unit. An external connector bus can only be attached to one internal bus at a time. All system components that are attached via the bus coupling of the same internal bus belong to a partition. Memory access in this system is conceptually provided via a multi-port memory controller. A multi-port memory controller provides several ports to accept memory requests from different sources at a time. Associated with each port is a memory controller interface that implements the proper bus protocol, by snooping on the bus for memory transactions and forwarding them to the memory controller to be realized (e.g. loaded or stored). The memory controller core arbitrates among the multiple ports to provide proper service of load and store requests. In the system of this invention, a memory controller interface is attached to each internal system bus. Furthermore, the memory controller interface, in addition to performing the functions described above, implements a real-to-physical remapping. Before memory requests are forwarded to the memory controller core, the real address as provided on the internal system bus is translated into a physical address. The real-to-physical map is setup and maintained by the partitioning control unit. If maps of different partitions overlap, a non-cache coherent inter partition shared memory can be implemented. As described above, the system isolates partitions from each other, provides secure and flexible partitioning through means of configurable xe2x80x9cinternal system bus to external connector bus couplingxe2x80x9d. The crossbar switches can be connected together along their internal buses to create larger systems.