1. Field of the Invention
The present invention relates generally to data processing systems and specifically to processing store operations within a processor chip. Still more particularly, the present invention relates to an improved system and method of dispatching operations within a processor chip for more efficient store queue usage.
2. Description of the Related Art
A queue is a data structure in which elements are removed in the same order they were entered. That is, elements of the queue are removed in a first in, first out (FIFO) arrangement. Queues perform the function of a buffer, providing a data structure where various entities such as data, objects, persons, or events are stored and held to be processed later. Queues are common in computer programs, where they are implemented as data structures coupled with access routines, as an abstract data structure or in object-oriented languages as classes. Common implementations are Circular buffers and Linked lists.
A store queue in a level 2 (L2) cache operates on a set data granularity, with each entry able to gather up to a set data size, for example, 128 bytes. A level 1 (L1) cache is a memory bank built into the CPU chip. A level 2 cache (L2) is a secondary staging area that feeds the L1 cache. As operations are received by the store queue, they are gathered into a store queue entry for dispatch to a read-claim machine. A read-claim machine handles memory access requests issued from the processor. “Gathering” is the processes of entering multiple operations into a single store queue entry. If an entry is not fully gathered into, the unused portion of the entry is wasted until the store queue entry is dispatched to the read-claim machine to be committed into the cache.
In a worst case example, an eight (8) entry 128 byte store queue receives eight random 1-byte stores such that none of the 8 stores are within 128 bytes of another store. The 8 entry 128 byte store would allocate one (1) 128 byte entry for each store operation, wasting the other 127 bytes per entry. The processor core would then be stalled waiting for at least one (1) of the 8 used entries to be dispatched to a read-claim machine before sending the next store operation. Processor stalling of this type is common in a business workload.
A possible solution to the under usage of store queue entry space problem is to have more store queue entries. Each store queue entry could then hold smaller amounts of data, which are dispatched more frequently to the read-claim machine. Each store queue entry dispatches individually to a read-claim machine in order to commit its portion of data. However, if the store stream from the processor core is very ordered, multiple store queue entries would have to be dispatched to multiple read-claim machines in order to commit the same amount of data as the large store queue entries.