Central processing units (CPUs) or control processors execute two types of instructions to access the memory. A load instruction fetches data from a memory location and puts it into CPUs registers, and a store instruction puts the data located in a register into memory. When data is stored in a cache memory, two steps are usually performed. Typically, these two steps incur a number of processor clock cycles when updating data in the cache memory. The first step involves looking up a tag within a tag array of the cache memory to see if the corresponding data is currently stored in the cache, while the second step involves writing new data (or updating data) into a data array (or cache line) of the cache that is identified by the tag. Unfortunately, the writing of new data into cache cannot be accomplished while a tag is being identified.
When performing multiple consecutive data stores, a conventional approach is to implement a storage buffer which holds a number of entries to be stored into a data array of the cache memory. If a data store instruction generates a hit of the cache memory, the data is put aside into the storage buffer, often termed a store buffer, in which the data is subsequently written into the data array. Often, the store buffer will become completely full, necessitating a removal of a data entry in the store buffer before a subsequent store instruction may be accommodated. For example, a store buffer may clear its buffer by writing one or more entries into the data array of a cache memory in order for it to accept additional data from new data store instructions. In this instance, the typical CPUs pipeline is not held up during a store instruction as long as there is an available store entry within the store buffer.
In many cases, however, a number of consecutive data store instructions may completely fill the store buffer. Should the next CPU cycle generate an instruction that is not a load or store instruction, any outstanding entries in the store buffer may be cleared by sequentially writing into the cache. Unfortunately, clearing the store buffer may require a number of CPU cycles to complete. As a consequence, the performance of a conventional pipelined processor may be significantly reduced when a number of consecutive data stores are performed.
There are other disadvantages when using a store buffer approach. The number of entries of the store buffer is usually limited to between 4 to 8 data entries in order to save space. In addition, store buffers with a large number of entries may take a longer time to access. Increasingly larger store buffers may hold more data; however, the benefit is at the expense of manufacturing cost. As a result of limiting the size of the store buffer to a size less than optimum, a number of additional “penalty” CPU cycles are needed to clear data from the buffer when the number of consecutive store instructions exceeds the capacity (e.g. number of entries) of the store buffer.
Further limitations and disadvantages of conventional and traditional approaches will become apparent to one of skill in the art, through comparison of such systems with some aspects of the present invention as set forth in the remainder of the present application with reference to the drawings.