Improving the performance of data processing systems is a never ending quest. One technique used to increase system performance is to perform certain operations in parallel, as opposed to serially. This allows for performing more than one operation at any given time. Some processors contain a plurality of execution units so that more than one instruction can be executed at a given time. Superscaler computers are an example of this type of architecture, where multiple instructions are simultaneously dispatched to multiple execution units for parallel execution.
Caches are high speed memory arrays used to store instructions or data that are required by a processor or execution unit within the processor. The cache resides between the processor and slower, main memory. By maintaining instructions and/or data in the cache, the processor is able to access such instructions and/or data faster than access to the main memory.
The use of multiple execution units presents a problem of how to allow each execution unit access to a single cache. Although providing separate caches to each execution would eliminate this problem, this solution increases system cost by requiring the addition of another cache array, This also results in added system complexity and redundancy, in that each execution unit is no longer sharing a common cache or executing a common instruction/data stream. Rather, the execution units are operating autonomously from one another.
One approach to allowing multiple accesses to a cache in a system having two execution units is to design a cache array which is a dual port cache. Thus, each processor would have its own interface to the cache. However, there are distinct disadvantages with this approach, since a dual port array requires 30-50% more physical area than a single port array. The dual port array also has slower access times.
Besides the processor interface requirements, another factor which must be considered in cache design is the interface to the main memory. A line of data, which consists of several cycles of data transfers, must be fetched from memory and written into the cache. However, this operation will interrupt accesses from the processor in either a single or dual port design. One approach to avoid interrupting the processor is to hold off one of the execution units until the reload is completed, but this hampers processor performance.
An approach which allows for cache access by the processor and still allows for concurrent cache reloads is to add a cache reload buffer, outside of the cache array, to store the memory line. This technique is described in U.S. Pat. No. 4,905,188 entitled "Functional Cache Memory Chip Architecture for Improved Cache Access", and hereby incorporated by reference. However, this approach is very costly in terms of physical space and complexity, including: a line's worth of registers for data storage, muxes on the input and output paths, and the associated control logic to read and write a separate area outside of the array. A cache reload buffer also requires a cache cycle to transfer its contents into the cache array.
It might be advantageous to create a cache array design that created a triple ported array cell. A triple port could allow for each processor to have access via a port, and the cache would also have a port for accessing main memory during a cache reload. However, a true triple ported array cell would have inherent complexities and physical space requirements.