In circuit design, a stream of tasks to be performed may be received, where execution of the tasks should be spread out among several processing entities. One of the processing entities may, for example, be specialized so that it performs a certain type of processing that the other processing entities do not perform. When an incoming task is received of the type performed by the processing entity, the task should be forwarded to the appropriate specialized processing entity. To perform this kind of allocation, an allocator should monitor incoming tasks to identify those tasks that are suitable for forwarding to particular processing entities according to the capabilities of the processing entities. In other situations, several processing entities may be equally suited to executing the same types of tasks, but some of the processing entities may become overburdened. To perform this kind of allocation, the allocator should perform a load balancing function so that the load of processing is spread more evenly across the processing entities. Several schemes, including round-robin or weighted round-robin schemes, exist for implementing load balancing. These schemes help ensure that the next task that is allocated is allocated to a processing entity that has processing throughput available.