1. Field of the Invention
The present invention relates generally to multiprocessors, redundant circuits, and high-speed microprocessors. More particularly, the present invention is directed towards a multiprocessor chip with a redundant architecture having microprocessors fabricated on silicon-on-insulator and dynamic random access memory elements fabricated on bulk silicon.
2. Description of Background Art
Multiprocessing is the use of more than one microprocessor to perform parallel processing. An apparatus to perform multiprocessing is typically called a multiprocessor or a parallel processor. There are several common control topologies for coordinating the action of the microprocessors and coupling the multiprocessor to a network. The microprocessors typically reside on separate chips with the system of microprocessors and memory units residing on one or more printed circuit boards. A signal bus is used to couple the microprocessors to different levels of memory.
One common application of multiprocessing is transaction processing, such as a banking or financial transaction, in which it is desirable to process an entire transaction in parallel. A transaction processor preferably has a large number of high-speed microprocessors coupled to a network by high bandwidth signal buses.
Each microprocessor of a multiprocessor system typically has a multiple level memory hierarchy that includes a small, fast cache memory close to the microprocessor and a larger slower main memory farther away from the microprocessor. The cache memory is typically a random access memory (RAM) that the microprocessor can access more rapidly than regular memory. Each microprocessor looks first to its corresponding cache memory to find data and instructions. The cache memory has levels of closeness, size, and accessibility to the microprocessor. Each level of cache memory typically has more memory than its predecessor but at the cost of a longer access time. Level-1 (L1) cache memory resides on the same chip as its corresponding microprocessor and may have a size of about 32 kilobytes or more. In modern microprocessors, level-2 (L2) cache memory typically resides off chip, although some microprocessor chips include a L2 cache memory implemented as low capacity static random access memory (SRAM). Typically the L2 cache memory is implemented as an SRAM or as a dynamic random access memory (DRAM) located on a different chip than the microprocessor. A popular off-chip L2 cache memory size is a 1 megabyte L2 cache. The level-3 (L3) cache memory always resides off-chip, and is often implemented as DRAM with a size of between about 4 megabytes to 32 megabytes. Each cache memory is often divided into separate data and instruction caches.
FIG. 1 is an illustrative block diagram of a conventional multiprocessor system 100 that includes a plurality of microprocessor chips 110. Each microprocessor chip 110 has its own L2 cache memory chip 120 and is coupled to other memory elements (e.g., a L3 cache memory chip 130) via a network signal bus 140. Conventional chip edge-pin I/O connections 150 and wires 155 are used to couple each microprocessor chip 110 to its corresponding L2 cache memory chip 120.
The speed of individual microprocessors continues to improve, with some silicon microprocessors having clock rates of about one GHz. However, the system performance of conventional multiprocessors is not keeping up with the improvements in microprocessor performance. This is because as the speed of each microprocessor increases the performance of the multiprocessor system tends to be increasingly determined by the rate at which data can be transferred between each microprocessor and its memory. This is commonly known as the memory bandwidth bottleneck. Memory bandwidth is defined as the data carrying capacity in bits per second. Memory bandwidth for random access memory (RAM) is a function of the rated speed of the RAM and the size of the data path to and from the RAM. In some multiprocessor systems, particularly systems having a large L2 cache and a microprocessor clock rate approaching one GHz, it can take ten-to-twenty clock cycles or more for data and instructions to be accessed from the off-chip L2 cache.
The memory bandwidth of a microprocessor having off-chip L2 cache memory is limited because conventional edge-pin I/O connections 150 and wires 155 have a limited ability to couple data between each microprocessor chip 110 and its associated L2 cache memory chip 120. At a clock frequency approaching one GHz a single wire 155 may be modeled as a lossy transmission line having a significant resistance and capacitance. A single data pulse (bit) transmitted along a wire 155 will have a significant propagation delay (transit time) associated with the path length of wire 155. There is also a rise-time associated with the impedance of the wire lead 155 and the parasitic impedances of the edge-pin I/O connections. There is thus a significant inter-chip time delay to transmit data between each microprocessor chip 110 and its L2 cache memory via a wire 155. There is also a maximum data rate (bandwidth) of each wire 155 in terms of the number of bits per second of data that it can transmit between microprocessor chip 110 and L2 cache memory chip 120. There are also a limited number of wires 155 that can be coupled to the edge pin I/O connections 150. The combination of all of these effects limits the rate at which data words from an off-chip L2 cache memory may be communicated to a microprocessor and also results in a large latency (time delay) for communicating data words.
As shown in FIG. 2A, a multiprocessor system 100 with off-chip L2 cache memory can be modeled as having each microprocessor 110 coupled to off-chip L2 cache memory 120 by a low bandwidth connection 160. Latency (transit time) of the connection is represented by the length of the arrow. The width of the arrow corresponds to the number of signals that it is capable of communicating. The small bandwidth of connection 160 is illustrated in FIG. 2A by the narrowness and length of the arrow 160. As shown in FIG. 2B, an on-chip L2 cache memory 175 disposed on the same chip 197 as the microprocessor 110 increases the bandwidth of the signal path, as indicated by the width and short length of arrow 195. However, conventional on-chip SRAM memory has a low density such that the total data size of the SRAM memory is comparatively small, as indicated by the small area of on-chip cache memory 175 in FIG. 2B. This is also undesirable, because a smaller L2 cache memory decreases system performance by increasing the frequency with which each microprocessor must access data and instructions from off-chip L3 cache memory.
Unfortunately, it is impractical with previously known techniques to integrate all of the microprocessors 110 and large capacity L2 cache memories 120 of a multiprocessor onto a single chip with acceptable yield, productivity (number of chips per wafer), and process compatibility. One factor that limits productivity is related to the size of the component units. Each microprocessor consumes a significant area, as do the caches. The process steps for fabricating static random access memory (SRAM) L2 cache memories are compatible with the process steps used to fabricate high speed microprocessors but SRAM L2 caches have a limited memory capacity because of the low data density of SRAM. It is thus impractical to include several conventional microprocessors and several large data capacity SRAM cache memories within a conventional die size. DRAM memory is typically ten times denser than SRAM, but commercial DRAM processes use fabrication steps that are often incompatible with the process steps used to fabricate high-speed microprocessors. For example, some of the process steps commonly used to fabricate dense L2 DRAM memory would degrade the speed of the microprocessors. In particular, the increased thermal budget from the added process steps required to fabricate the DRAM memory can degrade the transistors of the microprocessor. Also, some of the processing steps used to fabricate high performance microprocessors are incompatible with commercial DRAM processes. For example, DRAM cannot be fabricated on a silicon-on-insulator structure because of excessive leakage currents in the DRAM. The leakage currents lead to an unacceptable refresh rate for the DRAM. Another consideration is chip yield, since a low chip yield may render a process uneconomical. Integrating all of the microprocessors and L2 cache memories of a multiprocessor onto one chip increases the total number of components, which tends to decrease chip yield according to well known laws of probability.
Until recently the approach of designing a multiprocessor to have separate microprocessor chips and separate large data capacity L2 cache memory chips has been successful because silicon-based microprocessors had clock rates that were comparatively low (e.g., a fraction of one GHz) and because semiconductor packaging engineers were able to make significant improvements in the bandwidth of inter-chip connections. However, the inter-chip signal bandwidth afforded by conventional packaging techniques has many physical limits and is not expected to increase at the same rate as microprocessor speed. The memory bandwidth limitations of multiprocessors having off-chip L2 cache memory is thus expected to become an increasingly severe bottleneck to achieving further improvements in the processing speed of multiprocessors, particularly transaction processors.
Therefore, there is a need for an improved method and architecture for forming a multiprocessor chip having high-speed microprocessors and dense L2 cache memories integrated on a single chip.