Data processing speed in computer systems continues to increase with technological progress in designing and manufacturing central processing units (CPUs) As the speed of processing increases, so does the amount of data processed. Therefore the data needs to be stored in larger and more complex memory systems within the computer system. Along with the memory becoming larger, the required rate at which data is transferred to the CPU from the memory increases. However, as memories become larger they tend to become slower, and therefore limit the speed at which the CPU can work. A typical method of alleviating this problem is to employ a hierarchical memory system. In this type of memory system there is a small amount of fast, lower level, memory and a large amount of slow, higher level, memory. The smaller memory transfers data to and from the CPU very fast. The higher level memory transfers data to and from the lower level memory at a slower rate but it maintains the correct data in the lower level memory necessary for the CPU operations. The higher level memory is not required to run as fast as the lower level memory because the CPU typically asks for the same data many times. The combination of the lower and higher level memory provides the storage capacity and high data transfer rate required of modern computer systems.
The lower level memory, typically called the cache, is required to transfer data to the CPU at a high rate while simultaneously transferring data to and from the higher level memory. Cache memories meet this requirement by having their logic designed in such a manner that the memory cells of the cache can be written to and read from during one access cycle. That is, once the specific data address location is sent to the cache, the CPU can both write and read that location without sending the address to the cache a second time. This operation, known as a Write Through Read (WTR), impacts the entire system performance, by affecting the performance critical path because the CPU can only operate as fast as it can access the memory and the memory can only operate as fast as its slowest operation. The cycle time of the cache access is constant and must be long enough to accomplish its slowest operation. The slowest operation of the cache is the WTR operation because it is a serial combination of the write operation followed by a read operation to the same address. A slower cache access cycle due to the WTR operation, results in fewer instructions performed by the CPU in a given time, which degrades performance in the computer system.
The access time of the Cache is linked to the cycle time of the CPU through the logic associated with the cache. This logic maintains the timing between the CPU and the cache. The CPU will only send the cache a valid address during a defined time of the machine cycle period. The machine cycle is derived from phase clocks which define certain time intervals during which data can be transferred between parts of the computer system. Although there can be one or more phase clocks, there are typically two of them and each logic part outputs data depending on the state of one of the clocks. This results in logic associated with the first clock outputting data when the first clock is in a first state (high or low). The second clock is directly out of phase with the first clock (i.e. second clock is high when the first clock is low and vice versa) and logic associated with the second clock outputs data when the second clock is in a first state (high or low). Therefore, data from one group of logic (associated with the first clock) is shifted to a second group of logic (associated with the second clock) only when no data is being shifted from the second group of logic to the first group. This transfer timing guarantees that the data received by any CPU logic is the data that was meant to be received.
The cache logic receives and holds (or latches) encoded address bits during a second phase of one of the clocks. The data is latched during the second phase because the receiving logic is outputting data during the first phase of the clock. When the second phase of the this clock (first clock) is ended, the first phase of the first clock begins and sends the encoded address bits to decode logic. The results of the decode logic are used to address the cache memory cells. Waiting for the data to be latched during the second phase is wasted time. This wasted time is especially damaging to CPU performance during a read operation in comparison to a write operation. This is because the read operation is just the beginning of the data flow. Once data is read from memory cells, it must be sent through several operations (parity check, table look up, etc.) before it is useful to the CPU. All these operations take time and must be completed before the second phase of the first clock is ended. This is compared to the write operation where the data must be merely written before the second phase of the first clock ends. Since the decode operation is the last operation required before the data is written, there is ample time left in the cycle to finish the write operation. Therefore, the decreasing the read time is of critical importance in improving the CPU performance.
Decreasing the read time in the cache access time by eliminating the wasted time in the read operation does reduce the WTR operation time because of the serial nature of the operation. Decreasing the read time decreases the time for an operation made up of non-overlapping read and write operations. However, the WTR operation is still the longest operation associated with the cache access and therefore defines the cache access time. The serial nature of the operation limits the effectiveness of any improvements in the read or write operations on the overall cache access time.