1. Field of the Invention
This invention relates in general to the field of instruction execution in computers, and more particularly to an apparatus and method for allocating lines within a data cache upon writes to memory.
2. Description of the Related Art
The architecture of a present day pipeline microprocessor consists of a path, or channel, or pipeline, that is divided into stages. Each of the pipeline stages performs specific tasks related to the accomplishment of an overall operation that is directed by a programmed instruction. Software application programs are composed of a number of programmed instructions. As an instruction enters the first stage of the pipeline, certain tasks are accomplished. The instruction is then passed to subsequent stages of the pipeline for the execution of subsequent tasks. Following completion of a final task, the instruction completes execution and exits the pipeline. Execution of programmed instructions by a pipeline microprocessor is very much analogous to the manufacture of items on an assembly line.
The efficiency of any assembly line is primarily a function of the following two factors: 1) the degree to which each stage of the assembly line can be occupied with productive work; and 2) the balance of the effort required to perform tasks within each individual stage as compared to that required to perform tasks in the other stages, in other words, the degree to which bottlenecks are avoided in the assembly line. These same factors can also be said to affect the efficiency of a pipeline microprocessor. Consequently, it is incumbent upon microprocessor designers 1) to provide logic within each of the stages that maximizes the probability that none of the stages in the pipeline will sit idle and 2) to distribute the tasks among the architected pipeline stages such that no one stage will be the source of a bottleneck in the pipeline. Bottlenecks, or pipeline stalls, cause delays in the execution of application programs.
A pipeline microprocessor receives its data inputs from and provides its results to the outside world through memory devices that are external to the microprocessor. These external memory devices, along with the microprocessor are interconnected in parallel via a system bus. The system bus interconnects other devices as well within a computing system so that the other devices require can access data in memory or communicate with the microprocessor.
The memory devices used within present day computing systems operate almost an order of magnitude slower than logic devices internal to the microprocessor. Hence, when the microprocessor has to access external memory to read or write data, the program instruction directing the memory access is stalled in the pipeline. And if other devices are accessing data over the system bus at the same time that the microprocessor wants to access memory, then the program instruction may experience more lengthy delays until the system bus becomes available.
For these two reasons, a present day microprocessor incorporates a smaller-yet significantly faster-memory device within the microprocessor itself. This memory device, referred to as a cache, retains a copy of frequently used data so that when the frequently used data is required by instructions within an application program, rather than experiencing the delays associated with accessing the system bus and external memory, the data can be quickly accessed from within the cache.
The management of data within a cache, however, is a very complex task involving algorithms and logic that identify frequently used data and predict when one block of data is to be cast out of the cache and another block of data is to be retrieved into the cache. The goal of an effective data cache is to minimize the number of external memory accesses by the microprocessor. And to minimize the number of accesses to the system bus, present day cache logic does not read data from memory one byte at a time. Rather, memory is read into a cache in multiple-byte bursts. The number of bytes accessed within a burst is called a cache line. Cache lines are typically on the order of tens of bytes. Many pipeline microprocessors today employ 32-byte cache lines. Thus, when the system bus is accessed to retrieve data from external memory, an entire cache line is read that contains the required data along with surrounding data. Reading in the surrounding data is beneficial as well because one of the characteristics of application programs is that they tend to use data that is adjacent to that which has just been accessed. Consequently, when a program instruction requires a data entity that is not within the cache, the cache line that contains the data entity is retrieved from memory and placed into the cache. As a result, if following instructions require access to the data entity or surrounding data entities, they can execute much faster because the cache line is already present in the cache.
But program instructions not only read data from memory; they also write data. And the attribute of application programs discussed above applies just as well to writing data to memory as it does to reading data from memory. More specifically, when a program instruction directs a write to a location in memory, it is also very probable that following program instructions will either want to read or write that location or other locations within the same cache line. Hence, when a program instruction is executed that directs a write to a memory location that is not in the cache, a present day microprocessor first reads the corresponding cache line into the cache and then writes the data to the cache line. This technique for writing data is commonly referred to as blocking write allocate because a cache line entry within a cache is reserved, or allocated, only in response to a read operation. Consequently, every time that a program instruction directs a write to a memory location whose corresponding cache line is not within the cache, a read of the cache line is performed prior to writing the data.
For the program instruction directing the write to external memory, the above scenario is inconsequential because most microprocessors today provide store buffers within which memory write data can be buffered. Thus, the program instruction can continue to proceed through the pipeline without delay. Cache control logic within the microprocessor will complete the write to the allocated cache line within the cache after the cache line has been read.
But there is a problem associated with the blocking write allocate technique when viewed from the standpoint of following instructions. While the data associated with the first write to the cache line is retained within the store buffer, subsequent writes to the same cache line must be stalled only when the complete cache line is retrieved from memory and updated in the data cache can the following write instructions be allowed to proceed. This is a problem. More specifically, application programs that exhibit a significant number of writes to external memory experience considerable delays when they are executed on present day microprocessors employing blocking write allocate techniques.
Therefore, what is needed is an apparatus for allocating a cache line within a data cache corresponding to a memory write that does not require that the cache line first be loaded into the cache from memory.
In addition, what is needed is a pipeline microprocessor that can execute multiple writes to the same cache line much faster than has heretofore been provided.
Furthermore, what is needed is a data cache apparatus in a pipeline microprocessor that allows subsequent writes to a cache line to proceed without delay while waiting for the cache line to be provided from memory.
Moreover, what is needed is a method for improving the processing speed of a pipeline microprocessor executing multiple writes to adjacent memory locations that are not presently within its cache.
To address the above-detailed deficiencies, it is an object of the present invention to provide a microprocessor that allocates a cache lines on memory writes without first loading the required cache lines from memory.
Accordingly, in the attainment of the aforementioned object, it is a feature of the present invention to provide an apparatus in a pipeline microprocessor for allocating a first cache line within a data cache upon a write to an external memory location that is not presently within the data cache. The apparatus includes write allocate logic and a write buffer. The write allocate logic stores first bytes within the first cache line corresponding to the write, it updates remaining bytes of the first cache line from memory, and queues a speculative write command directing an external bus to store the first bytes to the external memory location. The write buffer is coupled to the write allocate logic. The write buffer buffers the speculative write command and issues the speculative write command to the external bus when update of the remaining bytes is interrupted.
An advantage of the present invention is that subsequent writes to addresses within an allocated cache line are not held up waiting for the corresponding cache line to be retrieved over the system bus.
Another object of the present invention is to provide a data cache apparatus in a pipeline microprocessor for executing multiple writes to the same cache line, where the cache line corresponding to the multiple writes is not initially present within the data cache.
In another aspect, it is a feature of the present invention to provide a cache line allocation apparatus within a pipeline microprocessor, for allocating a selected cache line upon a write miss. The cache line allocation apparatus has a data cache and cache control logic. The data cache stores a plurality of cache lines retrieved from external memory, where the plurality of cache lines are 32 bytes. The cache control logic is coupled to the data cache. The cache control logic stores data corresponding to the write miss within the selected cache line, and updates the selected cache line from the external memory, where selected bytes within the selected cache line are not updated, the selected bytes being those within which the data are stored.
Another advantage of the present invention is that back-to-back writes to locations within a cache line execute must faster that what has heretofore been provided.
A further object of the present invention is to provide a data cache apparatus in a pipeline microprocessor that allows subsequent writes to a cache line to proceed without delay while waiting for the cache line data to be provided from memory.
In a further aspect, it is a feature of the present invention to provide an apparatus for performing write allocation in a data cache when a write miss occurs. The apparatus includes write allocate logic and a write buffer. The write allocate logic updates a cache line within the data cache with data bytes corresponding to the write miss and with data from external memory, wherein the data bytes are updated prior to update of the data from the external memory, and wherein byte positions within the cache line corresponding to the data bytes are masked during update of the data, thereby preserving the data bytes within the cache line. The write allocate logic includes snoop/backoff control logic. The snoop/backoff control logic detects events that interrupt update of the cache line. The write buffer is coupled to the write allocate logic. The write buffer stores a speculative write command, the speculative write command directing that the data bytes be stored within said external memory.
Yet a further object of the present invention is to provide a method for improving the processing speed of a pipeline microprocessor executing multiple writes to adjacent memory locations that are not presently within its cache.
In yet a further aspect, it is a feature of the present invention to provide a method for allocating a cache line within a pipeline microprocessor. The method includes storing data bytes corresponding to a write miss to an allocated cache line within the data cache; queuing a speculative write command to external memory corresponding to the write miss; updating remaining bytes within the allocated cache line from external memory; and if the updating is interrupted, issuing the speculative write command, thereby storing the data bytes to the external memory.
Yet a further advantage of the present invention is that application programs exhibiting a significant number of external memory write operations execute more efficiently within a pipeline microprocessor according to the present invention.