A multi-strand out-of-order loop processor is an accelerator that is able to process multiple strands or micro-threads in parallel such that: (1) instructions of a strand or micro-thread may be fetched, issued, and executed out of program order with respect to instructions of different strands or micro-threads and (2) all but memory and interruptible instructions may be retired (committed) out of program order. A strand or micro-thread is a sequence of instructions arranged by a binary translator (e.g., at program compilation time for the hot loops identified), where instructions belonging to the same strand or micro-thread are to be executed by hardware in-order.
In a multi-strand out-of-order loop processor, orderable instructions (e.g., instructions that access memory or interruptible instructions) may be executed out of program order. However, orderable instructions are retired (committed) in program order to ensure that the side-effects of the orderable instructions (e.g., memory state changes, interrupts, and faults) appear in program order, as encoded in the original instruction flow. An architecture that employs a multi-strand out-of-order loop processor may utilize dedicated resources such as an ordering buffer to ensure that orderable instructions are retired in program order. The ordering buffer stores entries for orderable instructions to preserve the results of the orderable instructions until the orderable instructions are ready to be retired. Entries for orderable instructions are inserted into the ordering buffer as they are executed (e.g., potentially out of program order). However, entries are processed out of the ordering buffer for retirement in program order. The side-effects of the orderable instructions are disclosed at retirement stage.
When the ordering buffer has enough space, several strands that are being processed in parallel may insert entries into the ordering buffer, and thus make progress concurrently. This is beneficial for overall performance since it allows for parallel execution. However, when the ordering buffer is oversubscribed, strands that have an orderable instruction ready for execution are penalized. Progress of such strands can become serial instead of concurrent for a period of time (e.g., only one orderable instruction can be executed at a time until space becomes available in the ordering buffer). The problem is exacerbated when the ordering buffer is oversubscribed with young orderable instructions since this prevents entries for elder orderable instructions (which need to be retired before younger orderable instructions) from being inserted into the ordering buffer, which results in starvation of the multi-strand out-of-order loop processor for some time.