Some conventional computers implement a single pipeline-type processor, which executes computer program instructions in a serial order monolithically. In order to maximize the power of parallel processing at instruction level, also known in the art as instruction level parallelism (ILP), such processor may adopt aggressive optimizations in the compiler. Other ways of improving the performance of processors may include implementing out-of-order pipeline processors that may execute instructions in a non-serial order. However, such implementations may have disadvantages of increased complexity both in hardware and in software. Another option is to design processors that may operate at higher frequencies, but this can adversely cause increase in latency for accessing cache/memory, and/or can reduce the processing efficiency, which may be measured, for example, by instructions per cycle (IPC).
There has been recent research relating to clustering processor resources to design complexity efficient micro-architectures. According to one research, instead of using a monolithic pipeline, it has been suggested to use multiple clusters, wherein each cluster may have a lower complexity than a single monolithic pipeline. According to this research, computer program instructions may be dispatched to different clusters during execution with the objective of minimizing inter-cluster communication to reduce latency. According to another research, it has been suggested that dependency chain based execution that utilizes inter-instruction dependency relations may alleviate the complexity of the processor design.