1. Field of the Invention
Embodiments of the present invention generally relate to storing data in computing devices and, more specifically, to system and method for fast hardware atomic queue allocation.
2. Description of the Related Art
Queues are convenient data exchange interfaces that allow clients within a computing system to submit commands and/or data for processing by a target engine or processor without having to wait for the hardware engine or processor to become idle. Decoupling the insertion and processing steps has the benefit of increasing the perceived processing speed in environments where clients submit requests in asynchronous bursts. A queue typically has a head pointer and a tail pointer. The tail pointer indicates the location of the last processed submission in the queue, as submissions are processed the tail pointer is incremented. The head pointer indicates the end of all currently unprocessed submissions in the queue, the target will continue processing commands and data in the queue until the tail equals the head. Generally, a software client (referred to herein as a “client”) submits commands and/or data (referred to herein as a “payload”) to a queue by performing three steps. In the first step, the amount of queue space available is determined by computing the difference between head pointer and end of queue, or in the case of a circular queue: between the head pointer and the tail pointer including any wrap around. If there is sufficient queue space to accommodate the client's payload, then, in the second step, the client obtains an insertion pointer indicating that the payload may be inserted into the queue starting at the location specified by the head pointer. These first two steps are commonly referred to as “queue allocation.”. The last location after the insertion is termed the “horizon”, this will become new head pointer after insertion is completed. Finally, in the third step, the client inserts the payload into the allocated space within the queue and updates the head pointer to the horizon location that immediately followed the new submission, and the queue controller continues processing insertions until tail equals the new head.
One problem with this approach, however, is that in computing systems that include multiple asynchronous clients, the three-step process of submitting payloads often may result in conflicts between different clients when those clients share a common queue. For example, consider the case where there are two asynchronous clients, A and B. Client A may perform a queue allocation operation and obtain an insertion pointer for its payload. In the time period before client A actually inserts or commits its payload, client B may perform its own queue allocation operation and obtain the same insertion pointer as the one granted to client A. Further, client B may insert its payload at the location designated by the insertion pointer ahead of client A and update the head pointer to a new value. As a result, an attempt by client A to insert its payload starting at the original head pointer either fails altogether or compromises the payload inserted by client B. Thus, in a system where multiple clients may work in parallel at the same or different processing levels and may have same or different privileges, the process of submitting a payload to the queue can become a race to commit the queue insertion and update the head pointer. Moreover, since different clients may share the same processing unit but do not necessarily share operating state or even the same operating system context, such as when operating within separate virtual machines, coordinating such a race between clients can be quite difficult.
One approach to eliminating conflicts between different clients is to include a queue for each client. However, this approach is expensive since the additional queues take up valuable die area and it does not solve the problem of client contention, simply moves it deeper into the processing system. A more common approach is to use mutual exclusion (commonly referred to as “mutex”) algorithm over the entire queue allocation and payload submission process to allow only one client at a time access to the queue. Apart from the difficulty of creating a mutex that can be effectively used to co-ordinate between clients operating at different privilege levels or different virtual machines, the key drawback of this approach is that it will cause processing delays in each client until the mutex is released, thereby forcing a serialization of client requests and undermining the benefits of queues in decoupling multi-threaded, multi-processor systems. If the queue-mutex control is implemented in software, this forces operating system level serialization which further reduces the overall performance due to the software overhead.
As the foregoing illustrates, what is needed in the art is a queue allocation technique that avoids conflicts between multiple clients and is more efficient than prior art approaches.