Unix is a registered trademark of The Open Group. SCO and Unixware are registered trademarks of The Santa Cruz Operation, Inc. Microsoft, Window, Window NT and/or other Microsoft products referenced herein are either trademarks or registered trademarks of Microsoft Corporation. Intel, Pentium, Pentium II Xeon, Pentium III Xeon, Merced and/or other Intel products referenced herein are either trademarks or registered trademarks of Intel Corporation.
This invention relates to multiprocessing data processing systems, and more particularly to symmetrical multiprocessor data processing systems that use a clustered multiprocessor architecture. More specifically, the present invention relates to methods and apparatus for routing interrupts within a clustered multiprocessor system.
Systems having multiple but coordinated processors were first developed and used in the context of mainframe computer systems. More recently, however, interest in multiprocessor systems has increased because of the relatively low cost and high performance of many microprocessors, with the objective of replicating mainframe performance through the parallel use of multiple microprocessors.
A variety of architectures have been developed including a symmetrical multiprocessing (xe2x80x9cSMPxe2x80x9d) architecture, which is used in many of today""s workstation and server markets. In SMP systems, the processors have symmetrical access to all system resources such as memory, mass storage and I/O.
The operating system typically handles the assignment and coordination of tasks between the processors. Preferably the operating system distributes the workload relatively evenly among all available processors. Accordingly, the performance of many SMP systems may increase, at least theoretically, as more processor units are added. This highly sought-after design goal is called scalability.
One of the most significant design challenges in many multiprocessor systems is the routing and processing of interrupts. An interrupt may generally be described as an event that indicates that a certain condition exists somewhere in the system that requires the attention of at least one processor. The action taken by a processor in response to an interrupt is commonly referred to as the xe2x80x9cservicingxe2x80x9d or xe2x80x9chandlingxe2x80x9d of the interrupt.
In some multiprocessor systems, a central interrupt controller is provided for helping to route the interrupts from an interrupt source to an interrupt destination. In other systems, the interrupt control function is distributed throughout the system. In a distributed interrupt control architecture, one or more global interrupt controllers assumes global, or system-level, functions such as, for example, I/O interrupt routing. A number of local interrupt controllers, each of which is associated with a corresponding processing unit, controls local functions such as, for example, inter-processor interrupts. Both classes of interrupt controllers typically communicate over a common interrupt bus, and are collectively responsible for delivering interrupts from an interrupt source to an interrupt destination within the system.
The Intel Corporation published a Multiprocessor (MP) specification (version 1.4) outlining the basic architecture of a standard multiprocessor system that uses Intel brand processors. Complying with the Intel Multiprocessor (MP) specification may be desirable, particularly when using Intel brand processors. According to the Intel Multiprocessor (MP) Specification (version 1.4), interrupts are routed using one or more Intel Advanced Programmable Interrupt Controllers (APIC). The APICs are configured into a distributed interrupt control architecture, as described above, where the interrupt control function is distributed between a number of local APIC and I/O APIC units. The local and I/O APIC units communicate over a bus called an Interrupt Controller Communications (ICC) bus. There is one local APIC per processor and, depending on the total number of interrupt lines in an Intel MP compliant system, one or more I/O APICs. The APICs may be discrete components separate from the processors, or integrated with the processors.
The destination of an interrupt can be one, all, or a subset of the processors in the Intel MP compliant system. The sender specifies the destination of an interrupt in one of two destination modes: physical destination mode or logical destination mode. In physical destination mode, the destination processor is identified by a local APIC ID. The local APIC ID is then compared to the local APIC""s actual physical ID, which is stored in a local APIC ID register within the local APIC. The local APIC ID register is loaded at power up by sampling configuration data that is driven onto pins of the processor. For the Intel P6 family processors, pins A11# and A12# and pins BR0# through BR3# are sampled. Up to 15 local APICs can be individually addressed in the physical destination mode.
The logical destination mode can be used to increase the number of APICs that can be individually addressed by the system. In the logical destination mode, message destinations are identified using an 8-bit message destination address (MDA). The MDA is compared against the 8-bit logical APIC ID field of the APIC logical destination register (LDR).
A Destination Format Register (DFR) is used to define the interpretation of the logical destination information. The DFR register can be programmed for a flat model or a cluster model interrupt delivery mode. In the flat model delivery mode, bits 28 through 31 of the DFR are programmed to 1111. The MDA is then interpreted as a decoded address. This delivery mode allows the specification of arbitrary groups of local APICs by simply setting each APIC""s corresponding bit to 1 in the corresponding LDR. Broadcast to all APICs is achieved by setting all 8 bits of the MDA to one. As can be seen, the flat model only allows up to 8 local APICs to coexist in the system.
For the cluster model delivery mode, the DFR bits 28 through 31 are programmed to 0000. In this delivery mode, there are two basic connection schemes: a flat cluster scheme and a hierarchical cluster scheme. In the flat cluster scheme, it is assumed that all clusters are connected to a single APIC bus (e.g., ICC bus). Bits 28 through 31 of the MDA contain the encoded address of the destination cluster. These bits are compared with bits 28 through 31 of the LDR to determine if the local APIC is part of the cluster. Bits 24 through 27 of the MDA are compared with Bits 24 through 27 of the LDR to identify individual local APIC unit within the selected cluster. Arbitrary sets of processors within a cluster can be specified by writing the target cluster address in bits 28 through 31 of the MDA and setting selected bits in bits 24 through 27 of the MDA, corresponding to the chosen members of the cluster In this mode, 15 clusters (with cluster addresses of 0 through 14) each having 4 processors can be specified in a message. The APIC arbitration ID, however, only supports 15 agents, and hence the total number of processors supported in the flat cluster mode is limited to 15.
The hierarchical cluster scheme allows an arbitrary hierarchical cluster network to be created by connecting different flat clusters via independent APIC buses. This scheme requires a special cluster manager device within each cluster to handle the messages that are passed between clusters. The special cluster manager devices are not part of the local or I/O APIC units. Rather, they are separately provided. In the hierarchical cluster scheme, one cluster may contain up to 4 agents. Thus, when using 15 special cluster managers connected via a single APIC bus (e.g., ICC bus), each having 4 agents, a network of up to 60 APIC agents can be formed.
A limitation of the hierarchical cluster scheme as defined in the Intel Multiprocessor Specification is that a single independent APIC bus (e.g., ICC bus) may not provide sufficient bandwidth to effectively service all inter-cluster interrupts, particularly in larger systems that includes, for example, up to 15 special cluster manager devices connected to the bus. Conventional APIC devices include a communication protocol for communication over the ICC. This protocol is relatively serial in nature. For example, APIC devices typically send three different types of messages over the ICC bus: EOI type messages which consume 14 ICC bus cycles; short type messages which consume 21 ICC bus cycles; and non-focused lowest priority type messages which consume up to 34 ICC bus cycles.
If a single independent ICC bus is used to connect the various cluster manager devices, as suggested by the Intel MP specification, the independent ICC bus must handle all inter-cluster interrupts. Because an ICC bus is relatively serial in nature, the ICC bus may become a significant bottleneck for inter-cluster interrupts, thereby slowing system performance. To help reduce this bottleneck, multiple hierarchical ICC buses could be used to connect a number of hierarchically arranged special cluster manager devices. However, this approach would require significant overhead including additional cluster manager devices and additional ICC bus lines. What would be desirable, therefore, is a method and apparatus for increasing the routing bandwidth of interrupts between cluster manager devices in a clustered multiprocessor system without significantly increasing the overhead of the system.
The present invention overcomes many of the disadvantages of the prior art by providing a method and apparatus for increasing the routing bandwidth of interrupts between cluster manager devices in a clustered multiprocessor system without significantly increasing the overall overhead of the system. This can be accomplished by providing special cluster manager devices that can convert xe2x80x9cNxe2x80x9d serial messages received from a local APIC to xe2x80x9cMxe2x80x9d parallel messages, wherein M is less than N. The special cluster manager device then transfers the xe2x80x9cMxe2x80x9d parallel messages to a receiving cluster manager device. The receiving cluster manager device then converts the xe2x80x9cMxe2x80x9d parallel messages into the original xe2x80x9cNxe2x80x9d serial messages, and sends the xe2x80x9cNxe2x80x9d serial messages to the appropriate local APIC within the receiving cluster. By using this approach, the routing bandwidth between cluster manager devices may be significantly improved. Also, the conventional ICC bus protocol interface is maintained for all local APIC devices.
In one illustrative embodiment, the present invention is incorporated into a multiprocessor data processing system that has two or more processing clusters, wherein each cluster has one or more processor, and each processor has an interrupt controller associated therewith. Each cluster may further have a cluster manager, wherein the interrupt controllers associated with each of the processors in the cluster communicate with the corresponding hierarchical cluster manager using a first messaging format over a first bus. In a preferred embodiment, the first bus is an ICC bus, as described above. Each cluster manager then communicates with selected other cluster managers via a second messaging format, preferably over one or more second busses, a switching network or other communication means.
Each cluster manager preferably has a first format converter for converting the first messaging format into the second messaging format, wherein the second messaging format requires less transfer time than the first messaging format. In addition, each cluster manager preferably has a first transferring mechanism for transferring the message in the second messaging format to the appropriate receiving cluster manager(s). Moreover, each cluster manager preferably includes a second format converter for converting the received message in the second message format to a message in the first message format. Finally, each cluster manager preferably includes a second transferring mechanism for transferring the message in the first message format to the appropriate interrupt controller(s) in the receiving cluster.