1. Field of the Invention
The present invention relates generally processor design, and more particularly to systems and methods for using multiple units to manage retirement of queue entries, enabling faster and more efficient retirement of the entries.
2. Related Art
Memory buffers, also referred to as queues, are often used in digital circuits. Memory buffers are temporary holding areas for data that is in transit from one device to another (or from one process to another). Communicating devices often process data at different rates, and as a result, communication between these devices may be difficult and even impossible without intermediate buffers. If one device is ready to transmit data and the receiving device is not yet ready to receive the data, the data can be stored temporarily in the buffer between the devices until the receiving device is ready to accept the data.
Memory buffers can facilitate communication between devices such as processors, RAM, hard disks, etc. Most keyboards have memory buffers for the temporary storage of keystrokes. Most printers have buffers for queuing documents to be printed. Memory buffers can also be created for software programs by allocating a portion of the RAM of a computer system to act as a buffer to facilitate communication between a software program and the operating system. Within an operating system itself, buffers can facilitate the communication between processes and improve processor usage. For example, a destination process may be slower than a source process. Using a buffer between them to temporarily store the exchanged data can allow the source process to finish as soon as possible (also avoiding active waiting). Memory buffers can also exist between a software program and a hardware device. A CD-writing program, for example, creates a memory buffer in RAM during the writing process to temporarily store data before writing the data to the CD.
Typically, an entry is stored (registered) in the buffer and then removed (retired) if certain conditions are met. In very simple buffers, such as simple first-in-first-out queues (FIFOs), data entries can be retired from the buffer after the data has been transmitted successfully to the receiving device. In more complex devices, however, the number of conditions that must be met before a data entry can be retired from a buffer can significantly increase. For example, in complex multiprocessor systems, entries in a load or store queue can have many (e.g., ten or more) conditions that must be met before a data entry can be retired from the queue.
Traditionally, entries in a load or store queue of a processor have been retired according to read pointers which are incremented every cycle. In other words, a read pointer is incremented to one entry, which is retired in a first cycle, then the pointer is incremented to the next entry, which is retired in the next cycle. This was sufficient with longer cycle times and few entries (e.g., 8 entries or less). It is not sufficient, however, for some current processors which operate at higher clock frequencies (with shorter clock periods.) Further, these processors may have more (and more complex) retirement conditions than earlier processors. To process these retirement conditions, complex logic which is many levels deep is required. In addition, if it is determined by the logic that a data entry is to be retired, additional logic must generate multiple outputs whose purpose is to facilitate the retirement of the data. For example, the transferring of the data entry may require write requests to be made, pointers to be updated, counters to be incremented, etc. These additional requirements further increase the complexity of the logic and the time required to retire each entry.
Because the complexity of retirement logic is increasing while clock periods are decreasing, it is becoming very difficult to retire entries from processor load/store queues at the rate of one per clock cycle. As a result, the retirement logic can become a bottleneck, particularly in high-clock-frequency, high-performance systems. It would therefore be desirable to provide means to improve the speed and efficiency with which queue entries are retired, and to enable the retirement of one entry per clock cycle, even in high-performance systems.