1. Field of the Present Invention
The present invention relates generally to computing processing and, more particularly, to an evaluation circuit for memory allocation and deallocation requests for processing devices.
2. Description of the Related Art
Processing units are capable of executing processes or threads without regard to the order in which the processes or threads are dispatched. The out of order execution of processes or threads gives the processing units the ability to better utilize the latency hiding resources, to increase their efficiency, and to improve their power and bandwidth consumption.
Environments in which numerous concurrently executing processes or threads cooperate to implement an application is found, for example, in graphics processor units (GPU). GPUs are rapidly increasing in processing power due in part to their incorporation of multiple processing units, each of which is capable of executing an increasingly large number of threads. In many graphics applications, multiple processing units of a processor are utilized to perform parallel geometry computations, vertex calculations, pixel operations, and the like. For example, graphics applications can often be structured as single instruction multiple data (SIMD) processes. In SIMD processing, the same sequence of instructions is used to process multiple parallel data streams in order to yield substantial speedup of operations. Modern GPUs incorporate an increasingly large number of SIMD processors, where each SIMD processor is capable of executing an increasingly large number of threads.
When a GPU processes an image, for example, numerous threads may concurrently execute to process pixels from that image according to a single instruction stream. Each pixel or group of pixels can be processed by a separate thread. Some instructions cause threads to write to a memory, other instructions cause threads to read from the memory, and yet other instructions causes no thread interactions with memory. When instructions cause threads to, for example, write to the memory, it is important that a check mechanism be put in place to ensure that the memory area where the threads want to write has enough space. Current implementations use memory polling to poll memory locations to determine the age of the thread, then lock this memory location to write to the memory buffer. Typically, these implementations need to poll the memory again to ensure that enough space is available to write to the memory location. These implementations are inefficient and power and bandwidth consuming.