1. Field of Invention
This invention relates generally to multiprocessor computer system and specifically to cache memory of multiprocessor computer systems.
2. Description of Related Art
Some manufactures combine two or more central processing units (CPUs) on a single chip and sell the chip as a multi-processor unit (MPU). The MPU takes advantage of parallel processing to increase performance over a single CPU. An MPU typically includes a cache memory to store data in anticipation of future use by the CPUs. The cache memory is smaller and faster than the MPU""s main memory, and thus can transfer data to the CPUs in much less time than data from the main memory. When data requested by the CPUs is in the cache memory, there is a cache hit, and CPU performance approaches the speed of the cache memory. Conversely, when there is a cache miss, the requested data must be retrieved from main memory, and thus CPU performance approaches the speed of main memory. Thus, increased performance may be achieved by maximizing the percentage of cache hits during operation.
Some MPU architectures include a single cache memory that is shared by each of its CPUs. Since data stored in the shared cache memory is shared by each CPU on the chip, it is not necessary to store duplicate sets of data, which increases cache efficiency. Further, if one of the CPUs on the chip becomes defective, or is otherwise not required for a particular operation, the other CPU(s) may still access the entire cache memory. However, since more than one CPU may access the same cache memory locations, chip-level snoop operations are required between the CPUs on each MPU. These snoop operations are in addition to any system-level snoop operations between MPUs on a common bus. The additional circuitry required to perform the chip-level snoop operations undesirably increase the size and complexity of the associated cache controllers.
Other MPU architectures include a dedicated cache memory for each of its CPUs. Since only one CPU has access to any given cache memory location, snoop operations between the CPUs on the MPUs may be performed at the system-level rather than the chip-level. Accordingly, the cache controllers for dedicated cache memories are smaller and simpler than the cache controllers for a shared cache memory. However, if one of the CPUs becomes defective or is otherwise not required for a particular application, its dedicated cache memory is not accessible by the other CPU(s), thereby wasting cache resources.
Thus, there is a need for better management of cache resources on an MPU without requiring large and complicated cache controllers.
A method and apparatus are disclosed that overcome problems in the art described above. In accordance with the present invention, the resources of a partitioned cache memory are dynamically allocated between two or more processors on a multi-processor unit (MPU) according to a desired system configuration or to the processing needs of the processors. In some embodiments, the MPU includes first and second processors, and the cache memory includes first and second partitions. In one embodiment, each cache memory partition is a 2-way associative cache memory. A cache access circuit provided between the cache memory and the processors selectively transfers addresses and data between the first and/or second CPUs and the first and/or second cache memory partitions to maximize cache resources.
In one mode, both processors are set as active, and may simultaneously execute separate instruction threads. In this two-thread mode, the cache access circuit allows each processor to use a corresponding cache memory partition as a dedicated cache. For example, during cache read operations, the cache access circuit provides addresses from the first processor to the first cache memory partition and addresses from the second processor to the second cache memory partition, and returns data from the first cache memory partition to the first processor and data from the second cache memory partition to the second processor. Similarly, during cache write operations, the cache access circuit routes addresses and data from the first processor to the first cache memory partition and routes addresses and data from the second processor to the second cache memory partition. Thus, the first and second processors may use the first and second cache memory partitions, respectively, as dedicated 2-way associative caches.
In another mode, one processor is set as the active processor, and the other processor is set as the inactive processor. In this one-thread mode, the cache access circuit allows the active processor to use both the first and second cache memory partitions. For example, during cache read operations, the cache access circuit provides addresses from the active processor to both the first and second cache memory partitions, and returns matching data from the first and second cache memory partitions to the active processor. Similarly, during cache write operations, the cache access circuit returns addresses and data from the active processor to the first and second cache memory partitions. In this manner, the active processor may collectively use the first and second cache memory partitions as a 4-way associative cache.
The ability to dynamically allocate cache resources between multiple processors advantageously allows the entire cache memory to be used, irrespective of whether one or both processors are currently active, thereby maximizing cache resources while allowing for both one-thread and two-thread execution modes. In addition, the present invention may be used to maximize cache resources when one of the on-board processors is defective. For example, if one processor is found to be defective during testing, it may be set as inactive, and the cache access circuit may allocate the entire cache memory to the other processor.