1. Technical Field
The present invention is directed generally toward a method of handling a memory error in a data processing system containing multiple processors. More specifically, the present invention is directed toward a method of de-allocating multiple processors sharing a failing bank of memory.
2. Description of Related Art
In its simplest form, a computer system is made up of a processor, memory, and input/output peripherals. The processor executes a linear sequence of instructions to operate on data. The data can be read in or written out using input/output peripherals. Both data and instructions may be stored in the memory for the use of the processor.
Although many early computers were no more complex than that just described, most computers manufactured today are more complex than this simple model. Modern computer systems often contain a number of features to enhance computing speed and efficiency.
Memory in modern computers is often arranged in a hierarchical fashion, using what are known as caches. Caches are like memory scratchpads. They are small banks of memory that can be read from or written to quickly-more quickly than to the computer system""s main memory bank.
A cache, then, functions as a temporary storage location for data that is currently being worked with by the processor. A processor can work quickly with memory stored in a cache while other data is being transferred between the cache and main memory. Using a cache can often greatly increase the processing speed of a computer.
Many modern computer systems utilize multiple caches in cascade, where one cache is used as a temporary storage location for information stored in the next cache. In such systems, the cache closest in sequence to the processor is known as a xe2x80x9cprimaryxe2x80x9d or xe2x80x9clevel onexe2x80x9d (L1) cache. The next is known as a xe2x80x9csecondaryxe2x80x9d or xe2x80x9clevel twoxe2x80x9d (L2) cache, and so on.
Using multiple caches in cascade allows a computer system designer flexibility in balancing computing speed with cost. Because faster memories tend to be more expensive, one way to balance the conflicting objectives of low cost and high speed is to use a small primary cache, containing expensive but very fast memory, with a larger secondary cache, containing less expensive and slower memory than the primary cache, but faster memory technology than main memory.
Another speed-up mechanism commonly employed in computers is to provide in some form for multiple instructions to be processed at once. Pipelined and superscalar processors allow portions of computer instructions to be processed simultaneously. That is, one processor may process portions of two or more instructions simultaneously.
Multiprocessor computers utilize multiple processors to execute complete instructions independently. This can allow for a large increase in computing speed, particularly when executing programs that operate on large quantities of data values, such as graphics programs.
Having multiple processors also has a significant advantage in that it provides a level of redundancy and fault tolerance. That is, if a portion of a computer system containing multiple processors fails, it is sometimes possible for the problem to be circumvented by disabling one or more of the processors associated with the failure. When this occurs, instructions that would have been executed by the disabled processors can be diverted to other processors still operating.
In the past, each processor in a multiple processor computer was generally fabricated on its own integrated circuit. Today that is not always the case. Advances in Very Large-Scale Integrated Circuit (VLSI) technology have made possible the fabrication of multiple processors on a single integrated circuit. It is even possible to fabricate primary and secondary cache memory on the same integrated circuit.
Problems can arise, however, in the migration from single-processor integrated circuit technology to multiple-processor integrated circuit technology. Existing supporting hardware and software may not be readily compatible with newer integrated circuit designs. This makes migration to the newer technology difficult, because new supporting hardware and software must be developed to interoperate with the newer technology. Development of an entire line of new supporting technology is both costly and slow.
One such scenario in which migration is difficult involves a change from single-processor chips, each with its own secondary cache memory, to dual-processor chips, each with a shared secondary (L2) cache memory. In the older technology, a failing secondary cache meant that only one processor (the one associated with that memory) was affected. Thus, when an L2 memory failed, the supporting hardware and software reporting the problem to operating system software would only report one processor experiencing the problem. That one processor could then be disabled to prevent further errors.
In the newer technology, however, when an L2 memory fails, both processors sharing the L2 memory are affected. The supporting hardware and software, being designed to report a problem with only one processor, however, does not allow for both processors being disabled. Thus, in this scenario an incompatibility exists between the newer processor technology and the existing supporting hardware and software.
It would thus be beneficial if there were an error reporting method that would allow the newer multiple-processor integrated circuits to be compatible with existing supporting hardware and software designed to be used with single-processor integrated circuits.
Accordingly, the present invention provides a method by which existing supporting hardware and software may be made compatible with newer processor technology utilizing multiple processors with shared memory on a single integrated circuit. The present invention ensures that when a failure in the shared memory occurs, the failure is associated with all affected processors, so that all of the affected processors can be deactivated. In accordance with a preferred embodiment of the invention, the failure is reported multiple times, once for each of the affected processors. In this manner, multiple-processor integrated circuits with memory sharing may be utilized with existing error reporting technology that associates only one processor with a given failure.