1. Technical Field
The present invention relates generally to data processing systems and in particular to multiprocessor data processing systems. Still more particularly, the present invention relates to a method and system for enabling post-manufacture configuration of multiprocessor data processing system via interconnect method and logic.
2. Description of the Related Art
The evolution of data processing systems for use in commercial applications has occurred at a very rapid pace. This development began with the design and utilization of single processor systems and has evolved to design and utilization of more complex multiple processor systems (MPs). Most of the development has been driven by the increasing need in the industry for greater processing power and faster data operations.
Technical and Commercial servers are two examples of systems that have benefited from the additional processing power and faster overall data operations. Notably, in order to provide the faster overall data operations, quicker access to data is required, and the systems are typically designed with distributed memory systems, with each processor having direct access to an affiliated memory block.
One extension of the increasing the number of processors within the system is the creation of a multi-chip module (or MCM), which provides higher overall frequency. The In particular, the MCM configuration provides increased performance for commercial workloads. Multi-chip modules (MCMs) containing multiple chips each having two or more individual processors has replaced traditional single chip modules (SCMs), which include a single processor. In an MCM, two or more processor chips each comprising multiple processors are interconnected with buses having a particular bandwidth. Thus, for example, a four-processor multi-chip module (MCM) may be designed by interconnecting 4 single-processor chips with 16-byte buses.
FIG. 1 illustrates a conventional 4 processor MCM (also referred to as a 4-way SMP). As shown, MCM 100 includes four single-processor chips 101 interconnected by MCM bus 103 and MCM logic 107. Processor chips 101 of MCM 100 are interconnected to and communicate with each other via 16-byte MCM buses 103 with each chip 101 having a 16-byte MCM input bus and a 16-byte MCM output bus. Each processor chip is directly coupled to two other processor chips on MCM 100.
Each chip 101 contains internal MCM routing logic 107 that manages the inter-chip data transfers on the various buses. MCM routing logic 107 controls both routing to components within MCM 100 and routing to components connected externally to MCM 100. MCM routing logic 107 reads the destination address contained within the data component being routed and selects the appropriate bus on which to route the data component. For example, communication (collectively described herein as data communication, although instructions may also be routed between processor chips) from a processor on chip S to a processor of either of the adjacent processor chips, T or V, are sent by MCM routing logic 107 of chip S on the MCM buses 103 directly coupling the two chips; However, when communication is desired from a processors on chip S to one on chip U (i.e., the processor chip that is logically farthest away and not directly coupled to S), MCM routing logic 107 sends the communication to the processor on chip U via a hop across one of the two adjacent processor chips, T or V. Routing at each stage of the hop is controlled by MCM routing logic 107 on the particular chip. Each communication path between non-adjacent processors has a higher latency because of the extra hop that is required.
Each chip within MCM 100 connects to other external components including memory (not shown) and I/O devices (not shown) via additional buses (e.g., buses labeled 211, 113 and 115 connected directly to each die. The number of additional buses available for connecting external components (i.e., components other than the other processors) is a function of the size of the chip. Typically, only a fixed number of buses can be connected to each die, and thus the connectivity of each chip is limited by the fixed number of buses. Thus, although the 4-chip MCM has been efficiently designed, the 8-processor or 8-chip system with switch interconnect does not scale in performance or costs.
Recent trends in the development of data processing systems to handle large scale or complex tasks include the implementation of both large scale commercial and technical multiprocessor systems (MPs), which each provide respective operating requirements and functional characteristics.
The primary differences between the system configuration and the required operating parameters of a technical SMP and those of a commercial SMP include the following:                (1) technical SMPs typically have fewer processors than “comparable” commercial SMPs (e.g., scalable commercial SMP may be a 32 or 64 way, while a comparable scalable technical SMP may be a 8 or 16 way; and        (2) technical SMPs typically have very high memory bandwidth and low memory latency requirements, while commercial SMPs have lower bandwidth requirements (mostly due to usage of sophisticated caching mechanisms).        
These significant differences in the processing requirements of technical SMPs from those of commercial SMPs have led to a different design and manufacturing process for systems being utilized for technical versus commercial workloads.
Typically, these SMPs comprise two or more processors manufactured on processor chips that are interconnected via a bus or switch to each other. These chips are also connected to other components such as memory and input/output (I/O) via respective buses.
Systems designed for commercial workloads are generally not optimized at handling a technical workloads. Unlike commercial workloads, technical workloads utilize significantly less processor resources, but require much greater efficiency with respect to memory bandwidth and latency. Technical processing systems (i.e., SMPs designed to handle technical workloads) thus are typically configured differently from commercial ones.
Related patent application, Attorney Docket No. AUS920030001US1 provides a processor book that enables the development of large scale commercial systems. Specifically, that patent teaches the creation of processor books and utilization of the processor books as building blocks for a large scale commercial system. Owners of such a large scale system who have purchased these processor books may also wish to run some technical applications.
From the above it is clear that a processor book designed to handle technical workloads must be configured in a manner that enables fast and efficient MCM chip to MCM chip and processor-to-memory communication. The actual speed or and efficiency of the MCM chip-to-MCM chip and processor chip-to-memory operations (and vice versa) is primarily dependent on the size of the bus interconnecting the components, the distance between the components (i.e., length of the connecting bus), and the number of hops required to go from the first component to the next.
The latter factor is especially relevant when the destination chip is several hops away from the source chip. Each processor chip operates as a buffer at each hop and holds the communicated data until the data buses are available before forwarding the communication data to the next hop or the destination. Communication latency may thus be extremely long in the commercial SMP configuration depicted in FIGS. 2A and 2B.
Because of the differences in operation requirements between technical and commercial SMPs, both SMP types are manufactured with different functional design (connectivity of processors, size of buses, etc.). Absent the present invention, much of current art degrades commercial or technical performance in order to support both workloads. Also, the higher requirements for technical workload systems results in higher manufacturing costs for a processor book configured for technical workloads than a processor book configured for commercial workloads as illustrated and described above.
The present invention thus recognizes that there would be significant performance gain and cost savings opportunity if a processor book designed for a commercial workload could be easily re-configured to also support a technical workload. A processing system designed for commercial workloads that could be wired post-manufacture for technical workloads without significant additional logic would be a welcomed improvement. These and other benefits are provided by the invention described herein.