1. Field of the Invention
This invention relates in general to the field of instruction execution in computers, and more particularly to an apparatus and method for allocating lines within a data cache upon writes to memory.
2. Description of the Related Art
The architecture of a present day pipeline microprocessor consists of a path, or channel, or pipeline, that is divided into stages. Each of the pipeline stages performs specific tasks related to the accomplishment of an overall operation directed by a programmed instruction. Software application programs are composed of a number of programmed instructions. As an instruction enters the first stage of the pipeline, certain tasks are accomplished. The instruction is then passed to subsequent stages of the pipeline for the execution of subsequent tasks. Following completion of a final task, the instruction completes execution and exits the pipeline. Execution of programmed instructions by a pipeline microprocessor is very much likened to the manufacture of products on an assembly line.
The efficiency an assembly line depends upon the following two factors: 1) the degree to which each stage of the assembly line is idle; and 2) the balance of tasks performed within each individual stage as compared to the other stages, in other words, the degree to which bottlenecks are avoided in the assembly line. These same factors influence the efficiency of a pipeline microprocessor. Consequently, microprocessor designers 1) provide logic within each of the stages to maximize the probability that none of the stages in the pipeline will sit idle and 2) evenly distribute tasks among the stages so that no one stage will become a bottleneck in the pipeline. Bottlenecks, or pipeline stalls, cause delays in the execution of application programs.
A microprocessor receives its data inputs from and provides its results to the outside world through memory devices that are external to the microprocessor. These external memory devices, along with the microprocessor are interconnected in parallel via a system bus. The system bus interconnects other devices as well within a computing system so that the other devices require can access data in memory or communicate with the microprocessor.
The memory devices used within present day computing systems operate almost an order of magnitude slower than logic devices internal to the microprocessor. Hence, when the microprocessor has to access external memory to read or write data, the program instruction that directs the memory access is stalled in the pipeline. And if other devices are accessing data over the system bus at the same time that the microprocessor wants to access memory, then the program instruction may experience more lengthy delays until the system bus becomes available.
For the two reasons above, a present day microprocessor incorporates a smaller-yet significantly faster-memory device within the microprocessor itself. This memory device, referred to as a cache, retains a copy of frequently used data so that when the frequently used data is required by instructions within an application program, rather than experiencing the delays associated with accessing the system bus and external memory, the data can be accessed without undue delay from within the cache.
The management of data within a cache, however, is a very complex task involving algorithms and logic that identify frequently used data and predict when one block of data is to be cast out of the cache and another block of data is to be retrieved into the cache. The goal of an effective data cache design is to minimize the number of external memory accesses by the microprocessor. And to minimize the number of accesses to the system bus, present day cache logic does not read data from memory one byte at a time. Rather, memory is read into a cache in multiple-byte bursts. The number of bytes accessed within a burst is called a cache line. Cache lines are typically on the order of tens of bytes. Many pipeline microprocessors today employ 32-byte cache lines. Thus, when the system bus is accessed to retrieve data from external memory, an entire cache line is to read that contains the required data along with surrounding data. Reading in the surrounding data is beneficial as well because one of the characteristics of application programs is that they tend to use data that is adjacent to that which has just been accessed. Consequently, when a program instruction requires a data entity that is not within the cache, the cache line that contains the data entity is retrieved from memory and placed into the cache. Henceforth, if following instructions require access to the data entity or surrounding data entities, they can execute much faster because the cache line is already present in the cache.
But program instructions not only read data from memory; they write data too. And the attribute of application programs discussed above applies as well to writing data to memory as it does to reading data from memory. More specifically, when a program instruction directs a write to a location in memory, it is also very probable that following program instructions will either want to read or write that location or other locations within the same cache line. Hence, when a program instruction is executed that directs a write to a memory location that is not in the cache, a present day microprocessor first reads the corresponding cache line into the cache and then writes the data to the cache line. This technique for writing data is commonly referred to as blocking write allocate because a cache line entry within a cache is reserved, or allocated, only in response to a read operation. Consequently, every time that a program instruction directs a write to a memory location whose corresponding cache line is not within the cache, a read of the cache line is performed prior to writing the data.
For the program instruction directing the write to external memory, the above scenario is inconsequential because most microprocessors today provide store buffers within which memory write data can be buffered. Thus, the program instruction can continue to proceed through the pipeline without delay. Cache control logic within the microprocessor will complete the write to the allocated cache line within the cache.
But there is a problem associated with the blocking write allocate technique when viewed from the standpoint of following instructions. While the data associated with the first write to the cache line is retained within the store buffer, subsequent writes to the same cache line must be stalled. Only when the complete cache line is retrieved from memory and updated in the data cache can the following write instructions be allowed to proceed. This is a problem. More specifically, application programs that exhibit a significant number of writes to external memory experience considerable delays when they are executed on present day microprocessors employing blocking write allocate techniques.
Therefore, what is needed is an apparatus for allocating a cache line within a data cache corresponding to a memory write that does not require that the cache line first be loaded into the cache from memory.
In addition, what is needed is a pipeline microprocessor that can execute multiple writes to the same cache line much faster than has heretofore been provided.
Furthermore, what is needed is a data cache apparatus in a pipeline microprocessor that allows subsequent writes to a cache line to proceed without delay while waiting for the cache line to be provided from memory.
Moreover, what is needed is a method for improving the processing speed of a pipeline microprocessor executing multiple writes to adjacent memory locations that are not presently within its cache.
To address the above-detailed deficiencies, it is an object of the present invention to provide a microprocessor that allocates a cache lines on memory writes without first loading the required cache lines from memory.
Accordingly, in the attainment of the aforementioned object, it is a feature of the present invention to provide an apparatus in a pipeline microprocessor for allocating a first cache line within a data cache upon a write to an external memory location that is not presently within the data cache. The apparatus includes write allocate logic and a fill controller. The write allocate logic stores first bytes within the first cache line corresponding to the write, and updates remaining bytes of the first cache line from memory. The fill controller is coupled to the write allocate logic. The fill controller issues a fill command over an external bus directing the memory to provide the remaining bytes, wherein the fill command is issued in parallel with storage of the first bytes within the first cache line.
An advantage of the present invention is that subsequent writes to addresses within an allocated cache line are not held up waiting for the corresponding cache line to be retrieved over the system bus.
Another object of the present invention is to provide a data cache apparatus in a pipeline microprocessor for executing multiple writes to the same cache line, where the cache line corresponding to the multiple writes is not initially present within the data cache.
In another aspect, it is a feature of the present invention to provide a cache line allocation apparatus within a pipeline microprocessor, for allocating a selected cache line upon a write miss. The cache line allocation apparatus has a data cache and cache control logic. The data cache stores a plurality of cache lines retrieved from external memory. The cache control logic is coupled to the data cache. The cache control logic stores data corresponding to the write miss within the selected cache line, and updates the selected cache line from the external memory. The data is stored before the selected cache line is updated, and selected bytes within the selected cache line are not updated, the selected bytes being those within which the data are stored. The cache control logic includes a fill controller that detects a bus snoop during update of the selected cache line and repeats update of the selected cache line.
Another advantage of the present invention is that back-to-back writes to locations within a cache line execute must faster that what has heretofore been provided.
A further object of the present invention is to provide a data cache apparatus in a pipeline microprocessor that allows subsequent writes to a cache line to proceed without delay while waiting for the cache line data to be provided from memory.
In a further aspect, it is a feature of the present invention to provide an apparatus for performing write allocation in a data cache when a write miss occurs. The apparatus includes write allocate logic and a write buffer. The write allocate logic updates a cache line within the data cache with data bytes corresponding to the write miss and with data from external memory, where the data bytes are updated prior to update of the data from the external memory, and where byte positions within the cache line corresponding to the data bytes are masked during update of the data, thereby preserving the data bytes within the cache line. The write allocate logic has fill control logic that terminates update of the data in response to a bus snoop, and repeats update of the data following the bus snoop. The write buffer is coupled to the write allocate logic. The write buffer stores a speculative write command, the speculative write command directing that the data bytes be stored within the external memory.
Yet a further object of the present invention is to provide a method for improving the processing speed of a pipeline microprocessor executing multiple writes to adjacent memory locations that are not presently within its cache.
In yet a further aspect, it is a feature of the present invention to provide a method for allocating a cache line within a pipeline microprocessor. The method includes storing data bytes corresponding to a write miss to an allocated cache line within the data cache; following the storing, updating remaining bytes within the allocated cache line from external memory; and if the updating is interrupted by a bus snoop to the allocated cache line, issuing a load command, thereby causing the updating to be performed again.
Yet a further advantage of the present invention is that application programs exhibiting a significant number of external memory write operations execute more efficiently within a pipeline microprocessor according to the present invention.