1. Field Of The Invention
The present invention relates generally to high speed central processor unit (CPU) cache memory, and more particularly to a system and method for improving processor system performance by optimizing the cache memory write time.
2. Related Art
Computer and microprocessor systems typically employ random access memory (RAM) chips for storing instructions to be executed and data to be manipulated. To utilize this memory, the microprocessor must `access` the information stored in the memory. Accessing memory involves two steps. The first step is addressing the specific memory location from which the data is to be retrieved or to which the data is to be written. The second step is actually retrieving the data from, or writing data to that specific memory location.
Accessing the memory for instruction fetching and for read and write operations involves a relatively significant amount of time. In fact, the time required to access computer memory is often a rate-determining factor that constrains the speed at which the computer system may operate. Even if the CPU can operate at greater speeds, operations can only be performed as quickly as the data can be transferred between memory and the CPU. In other words, computer and processor systems can operate only as fast as the instructions can be retrieved from memory, or as fast as the data required to execute those instructions can be written to or retrieved from memory.
To increase the overall speed at which the system performs its designated operations, conventional system designs have incorporated high-speed memory architectures. Such architectures employ high-speed cache memory to achieve rapid data transfer. Cache memories are typically built from bipolar or bipolar/CMOS (complimentary metal oxide semiconductor) devices which are faster than the traditional metal-oxide-semiconductor (MOS) devices. Cache memories are often designed using static RAM (SRAM) chips because SRAMs provide fast access times.
Bipolar cache memories are more costly than their slower MOS counterparts. Consequently, their application is typically limited to storing information most frequently used by the computer systems. Other information, not as frequently used, is stored in more cost effective, but slower, MOS DRAM chips. However, even faster cache memories have continued to limit the speed of conventional computer systems.
Market demands continue to require systems operating at high frequencies. Current demands are for systems operating in the range of 80 to 100 MHz. As a result, conventional system designs have begun using faster SRAMs in an attempt to operate at these frequencies. However, read and write timing limitations constrain these conventional systems to operate at frequencies somewhat less than the maximum cache access cycle frequencies.
Ideally, the maximum frequency, F.sub.c of accessing the cache is the reciprocal of 20 the cache access cycle time, T.sub.c. Therefore, a 10 nanosecond cache RAM chip can ##EQU1## theoretically operate at 100 MHz, while an 8 nanosecond chip can theoretically operate at 125 Mhz.
Theoretically, the actual processor frequency, F.sub.a could be as high as F.sub.c. In fact, with some conventional read techniques, read operations can be as fast as T.sub.c. However, due to write timing limitations and timing uncertainties, this maximum processor frequency may not be attained in conventional systems. Typically, the ratio of actual processor frequency, F.sub.a, to cache access frequency, F.sub.c, is approximately 1.25. In other words, conventional systems operate 25% slower than the theoretical maximum frequency. Thus, in conventional systems using 10 nanosecond SRAMs the actual processor cycle time, T.sub.a, is limited to approximately 12.5 nanoseconds, considerably greater than the cache access time T.sub.c. ##EQU2##
As mentioned above, write timing limitations are the reason the system must operate slower than the maximum SRAM speed in the frequency range of 80-100 MHz. There are several system characteristics that contribute to these write timing limitations. These characteristics include the delays associated with driving the addresses and control signals to the SRAMs. Additionally, uncertainty in these delays requires additional time be allowed for accessing the cache. Conventional write timing methods typically use two or three control signals to perform a write operation. These signals are a write control signal, a chip enable signal, and an output enable signal. These signals are set true (asserted) and reset false using control clock edges. Performing the write operation depends on asserting these signals before certain steps can be performed. However, there is imprecision associated with the temporal placement of the control clock edges and therefore imprecision associated with asserting and resetting each of the three control signals. As a result, additional time must be included in the write operation (i.e., T.sub.a must be increased) to account for the imprecision in placement of these three signals.
The write cycle timing algorithm must be designed to account for the worst-case placement of all three control signals. Therefore, a longer write-cycle time is needed. Consequently, longer processor cycle time T.sub.a is required, or an increased number of processor cycles are required to perform the write operation. A longer T.sub.a results in a slower system operation. If an increased number of processor cycles is required to perform a write operation, write operations will impede system performance.
To operate at frequencies of 80 to 100 Mhz, designers must keep T.sub.a down to 10 to 12.5 nanoseconds. With a T.sub.a of 10 to 12.5 nanoseconds and a long write cycle time, conventional systems are forced to increase the number of processor cycles required to perform a write operation to more than two.
Designers of conventional systems have implemented a number of techniques in an attempt to minimize the actual processor cycle time, T.sub.a. In one technique, separate cache control units (CCUs) have been used to direct access to the cache RAM. However, this approach requires the addition of buffers and latches to facilitate addressing. This additional circuitry adds delay into the read and write operations. Also, there are additional costs associated with this circuitry, and it consumes power and space.
Another approach in conventional systems has been to customize SRAMs to incorporate latches and/or multiplexers. This is done in an attempt to minimize the uncertainty in the control signals by holding the data in these latches. However, the access time at which the SRAM can function is increased in these applications because of the additional circuitry required. Also, customized SRAMs cost more, require more control, and because they have more levels of circuitry, have an inherently slower access time.
Each instruction in a microprocessor cache memory system requires that a read operation be performed to execute that instruction. Only a small percentage (typically 20-25%) of the instructions require that a write operation be performed. Therefore, microprocessor cache memory systems are optimized for the read cycle, and T.sub.a is chosen based on the read cycle.
What is needed is a system and method for minimizing the amount of time required to write data to an SRAM. The coordination of the write cycle time with the read cycle time is critical for minimizing T.sub.a. T.sub.a is at a minimum when the write sequence time is the same as the read sequence time (or an integer multiple thereof). If the write cycle takes longer than the read cycle (or multiple read cycles), T.sub.a must be longer than the read cycle and the increase in T.sub.a is wasted time. Additionally, the number of cycles required for write operations should be minimized.