1. Field of the Invention
The present invention relates to multiprocessor arrangements in computer systems. More specifically, the invention relates to an extension of a known multiprocessor scheme to allow additional processors to share the same bus.
2. Related Art
In recent years, two-CPU (dual) and four-CPU (quad) processing .times.86 servers have become a staple of the corporate world. Powered first by Intel.RTM.i486 chips, then Pentium.RTM. processors, and finally Pentium.RTM. Pro chips, these systems have expanded the role of the PC server into fields once dominated by significantly more expensive minicomputers and mainframes. The emergence of Intel's.RTM. MP spec boosted the proliferation of these servers by providing a unified specification for hardware and software vendors.
It is recognized that multiprocessing is not the only option for handling an increased workload. Network supervisors could merely reduce the size of networks and increase the number of servers. This approach, however, would be far more costly. Additionally, company-wide exchange of data would be slowed as the various networks would have to constantly communicate with each other over relatively slow network lines.
A case could also be made for merely selecting a more powerful processor in the first place. Throughout the history of PC servers, there have often been RISC (reduced instruction set computer) processors available that have been more powerful than the current .times.86 leader. Multiprocessor .times.86 configurations, however, have usually been able to match or exceed the power of these single RISC chips at significantly less cost. These multiprocessor configurations have also maintained compatibility with a wider range of readily available hardware and software, further reducing the costs of operation and implementation.
A further argument for multiprocessor-ready servers is the efficient scalability afforded to the network. Companies can start with exactly the amount of processor power they need. They can utilize the fastest .times.86 processor available, or opt for a more affordable alternative. As the network grows, increased use can be accommodated simply by adding more processors and memory. This approach is substantially more cost-effective than replacing the entire server. It also spreads out a company's IS (information systems) investment over a longer period, reducing financing costs and better accommodating current budgets.
Accordingly, the advantages of multiprocessor arrangements are apparent, and the need in the art for optimal multiprocessor schemes has become of increasing importance.
In the mid-to-late 1980s, the versatile .times.86-based personal computer was the subject of a major evolution in both form and function. Diverging from their roots as terminal emulators and single user workstations, many of these systems were reconfigured for use in network server configurations. At a department level, these original servers and their accompanying local area networks proved to be an effective, low cost alternative to traditional mini-computer/terminal environments.
But as networks grew in both size and importance, it became obvious that further expansion of storage and memory would need to be matched by a significant increase in processor power. To address this issue, a number of different multiprocessor configurations were developed. Some of these configurations emphasized power, often at the sacrifice of cost efficiency. There has been a need in the art to combine power and cost-efficiency.
Over the history of .times.86 server development, several configurations have evolved for implementing multiple processors. For example, concerning Pentium.RTM. Pro based servers, three multiprocessing architectures are being used. They are the Parallel Bus, cc Numa, and Clustering architectures. While each of these configurations has its own advantages, each also involves certain difficulties.
The parallel bus architecture has been, by far, the most popular in the development of .times.86-based servers. In a parallel bus architecture, the various processors are said to be "closely coupled." Communication and interaction among the various processors is rapid and efficient, with all parts of the process operating at fill external CPU speed. For example, one of Assignees's earliest multiprocessing 386 servers was based on a proprietary C-Bus architecture in which all processors and memory resided on an extended 32-bit bus. This implementation worked well for earlier processors with their relatively limited power requirements and low heat generation. The parallel bus architecture design offers easy and efficient scalability, and with Intel's.RTM. Orion chip set, this architecture supports a sustained data bandwidth exceeding 400 MB/s. Quad processor support is built into the Pentium.RTM. Pro chip, making four-CPU systems based on this architecture cost-efficient. And because of Intel's.RTM. MP spec v 1.1 and 1.4, operating systems and applications supporting such servers are both plentiful and economical.
Using more than four Pentium.RTM. Pro chips, however, the advantages of the conventional parallel bus architecture begin to fade. For signal strength, the processors must be in close physical proximity. But physically grouping the processors together is a design challenge in itself, and heat build-up becomes a critical problem.
The Pentium.RTM. Pro chip also offers its own logical challenges. To promote multiprocessing the Pentium.RTM. Pro chips offers built-in support for only a two-bit processor APIC (Advanced Programmable Interrupt Controller) ID code. This results in four possible binary combinations--00, 01, 10 and 11. Translated into decimal numbers, it supports processor IDs of 0, 1, 2, and 3. It is this processor identification that facilitates data bus arbitration in an SMP (symmetrical multiprocessing) system. While this design makes engineering two- and four-CPU systems easier and more cost-effective, it does not support a standard parallel upgrade path for more than four processors, thereby substantially limiting system performance.
Like the parallel bus architecture, the cc Numa architecture keeps processors closely coupled, resulting in efficient use of processor resources. The downside of this design, however, is that it requires a proprietary operating system and proprietary applications. This makes it an inappropriate design for most standard corporate environments. As a result, systems utilizing this architecture are primarily utilized for unique high-end applications requiring more than eight processors.
In a clustering environment, multiple servers are actually connected together to achieve extended multiprocessing. This use of duplicate servers makes this approach inherently fail-over ready. Up to now, this has also been the only easy way to harness the power of more than four Pentium.RTM. Pro chips. This approach, however, is plagued with limitations.
The most prominent limitation of the clustering approach is the physical link between the servers. With current Ethernet technology, the server to server connection is limited to a 100 MB/s bandwidth. This is less than a fifth of the potential bandwidth of a parallel bus architecture. Partitioning data storage between the two servers is also a problem that needs to be addressed in server-clustering arrangements.
Another major problem with server-clustering is a lack of SMP software support. To date, there is no industry-wide standard to support true symmetrical multiprocessing in a clustering configuration. As a result, clustering is currently application-dependent. Most clustered servers today are redundant servers, utilized for their fail-over capabilities rather than enhanced processor power. These configurations can, however, co-exist with parallel bus architecture systems.
In view of these various inadequate alternatives to multiprocessing expansion, there has long been a need in the art to develop a server supporting more than four processors such as the Pentium.RTM. Pro. To fulfill this need, several issues had to be addressed.
First the logical limitations of the Pentium.RTM. Pro chip must be dealt with. As stated above, the basic implementation of the Pentium.RTM. Pro chip only supports a two-bit processor ID code. This particular convention limits the number of Pentium.RTM. Pro chips in a system to four. To overcome this limitation, the art has not yet tapped unused aspects of the chip to extend the identification process.
Second, bandwidth limitations have been a factor limiting progress in the art. Clustering would be the easiest way to implement multiple processors. But with clustering comes limited bandwidth. There has been a need in the art to tap the benefits of clustering without incurring the bandwidth limitations of a system-to-system link.
Third, physical challenges have hampered progress in this technology. Ideally, a Pentium.RTM. Pro chip-based system with more than four processors would offer the bandwidth of a closely-coupled parallel architecture. But physical necessities of coupling more than four chips in a standard parallel architecture results in an intolerable level of heat build-up, as well as an unwieldy system board design. The unmet challenge has been to achieve the bandwidth of a closely-coupled parallel architecture without its physical limitations.
Fourth, electrical requirements, namely extensive power requirements of multiple Pentium.RTM. Pro chips, have not been dealt with in the art. A successful multiprocessor server needs to support a large number of Pentium.RTM. Pro chips without exceeding the electrical resources of a normal corporate environment. It would need to support these chips in a cost-effective manner.
Fifth, lack of software support has been a problem. A successful corporate server must support off-the-shelf versions of popular operating systems and applications. To comply with this requirement, those skilled in the art must develop a system that conforms to current MP specifications.
It is to meet the foregoing requirements, and to overcome the foregoing limitations, that the present invention is directed, as described in the Summary of the Invention and in the Detailed Description of the Preferred Embodiments thereof. As background for understanding the invention and its advance in the art, a known multiprocessor arrangement is now discussed in more detail.
As introduced above, multiprocessor arrangements, involving plural processors that share a common bus 100, are known in the art, as shown in FIG. 1. This sharing of busses commonly involves controlled sharing of data, address and control busses to allow the CPUs to have access to and control of peripheral devices (not shown).
APIC (Advanced Programmable Interrupt Controller) ID Expansion is a known scheme of identifying each of four processors that share a common bus. The scheme is described in the Pentium.RTM. Pro Family Developer's Manual, Volume 1 (Specifications), Chapter 4 (Bus Protocol), available from Intel.RTM. Corporation, P.O. Box 7641, Mt. Prospect, Ill., 60056-7641 (order no. 242690), January 1996. Reference is also made to U.S. Pat. No. 5,515,516 (Fisch et al, Intel.RTM. Corporation), which discloses an initialization mechanism for symmetric arbitration agents. These documents are incorporated herein by reference as if reproduced in full below.
In this known four-processor system, the APIC ID for each processor is assigned by taking the bus number assigned to each processor. FIG. 1 illustrates how the bus number is determined.
Referring to FIG. 1, each processor 0, 1, 2, and 3 has three inputs, BR1#, BR2# and BR3#, as well as one output BR0#. All the BRn# lines of each CPU are interconnected among four processors through four external signals, BREQ0#, BREQ1#, BREQ2# and BREQ3#. This connection associates the APIC ID with each CPU such that the processor can use it during multiprocessor applications.
Upon power-up, the system logic asserts the signal BREQ0# active, leaving the other three signals BREQ1#, BREQ2#, and BREQ3# inactive. This forces the three BR1#-BR3# inputs of each CPU to be as shown in the following chart:
______________________________________ CPU BR1# BR2# BR3# Binary Value ______________________________________ CPU #0: 0 0 0 = Binary 0 CPU #1: 0 0 1 = Binary 1 CPU #2: 0 1 0 = Binary 2 CPU #3: 0 1 1 = Binary 3 ______________________________________
Each CPU uses the binary setting of the BRn# lines to determine its APIC ID.
The Pentium.RTM. Pro processor also has a way to strap its APIC ID to belong to a different group of four processors called a cluster. Specifically, address lines A12# and A11# are used for this purpose. Upon reset each CPU, beside determining its lower 2-bit APIC ID, samples the logic state of both A12# and A11# lines to set the cluster ID according to the following chart.
______________________________________ A12# A11# Cluster ID APIC ID Range ______________________________________ Logic 0 Logic 0 3 C-F Logic 0 Logic 1 2 8-B Logic 1 Logic 0 1 4-7 Logic 1 Logic 1 0 0-3 ______________________________________
Since A12# and A11# are normally pulled to logic 1, the default cluster ID for a standard four-CPU system is 0.
As mentioned above, this known arrangement provides multiprocessor ability for up to only four processors. Accordingly, in view of the foregoing performance demands and design challenges, there has been a need in the art to provide common-bus multiprocessor arrangements in which a larger number of processors are supported. The present invention is directed to meeting these performance demands and design challenges.