This invention relates to integrated circuits and, more particularly, to integrated circuits with dynamic thread order modules which issue threads and groups of threads in a specific order to take advantage of data locality and caching on a given platform.
Every transition from one technology node to the next technology node has resulted in smaller transistor geometries and thus potentially more functionality implemented per unit of integrated circuit area. Synchronous integrated circuits have further benefited from this development as evidenced by reduced interconnect and cell delays, which has led to performance increases. However, more recent technology nodes have seen a significant slow-down in the reduction of delays and thus to a slow-down in the performance increase.
To further increase the performance, solutions such as multithreading have been proposed, where several threads that share processing and storage resources are grouped into subsets and each of the subsets is executed in parallel. Each thread may access different portions of the shared resources, and thus the grouping into subsets and the order in which the threads in a subset are executed affect the overall performance.
Some platforms such as OpenCL do not guarantee the order in which threads are executed, and the lack of a thread order may lead to poor usage of the shared resources and a subsequent degradation of performance.