Current evolution of technology nodes drives the critical dimensions in semiconductor devices smaller and smaller. However, mere reduction in the critical dimensions and transistor sizes no longer provides corresponding improvements in power, performance and silicon area. With shrinking transistor size and computer processor multi-core architectures more operations can be performed in silicon. In particular, multi-core architectures provide parallel processing capability for processing speed-up. However, power constraints arising from the afore-mentioned lack of corresponding improvements in power usage and power limitations in, for example, mobile devices prevent powering all of the circuitry at the same time. This is especially true in computer processors having multi-core architectures.
Homogeneous core architectures having a plurality of simple cores are ideal for applications with high thread-level parallelism, yet suffer from a lack of instruction-level parallelism support due to large sequential fractions in handling complex applications. Multi-core architecture can be designed for high-performance handling of specific applications (e.g., graphics processors) but are not adaptable enough to handle other operations.
Computer processors, however, are being required to do more, even in mobile devices with limited power. For example, a Smartphone today is expected to support a very dynamic and diverse landscape of software applications. This is driving the distinction between high-performance embedded architecture and general purpose computing architecture to rapidly disappear, especially in the consumer electronics domain. Traditional homogeneous multi-core architecture having a collection of identical simple cores is not suitable for today's workload. On the other hand, embedded systems having heterogeneous multi-core solutions are customized for a particular application domain offering significant advantage in terms of performance, power, and silicon area, yet lack the adaptability to execute the variety of applications in today's workload such as the wide variety of general-purpose applications for which the workload is not known a-priori.
Some previously proposed architectures have attempted to adapt multi-cores to speed up sequential applications. An asymmetric chip multiprocessor that comprises of cores with different size and performance was proposed. The advantage of this architecture is low power consumption and high performance achieved by dynamically moving programs from one core to another in order to reach an optimal point. However, this architecture lacks flexibility and introduces a high degree of unpredictability for software applications. Another conventional architecture fuses homogeneous cores to improve single-thread performance, yet requires complex distributed hardware leading to higher performance overhead. Another approach merges a pair of scalar cores to create 2-way out-of-order cores by modifying internal pipeline stages. However, conjoining in-order processors to form complex out-of-order cores introduces fundamental obstacles limiting achievable performance and does not increase performance because the cores do not have minimal out-of-order capabilities
In yet another multi-core architecture proposal, multiple homogeneous cores can be adapted for single and multi-threaded applications, yet rely on a very complex compiler that exploits parallelism from serial code by partitioning the code into small threads, scheduling the instruction to the cores and directing the communication among the cores. In one proposed asymmetric chip multiprocessor alternative, the serialization effect of a critical section for multithreaded application suffers from threads which do not finish their jobs at the same time resulting in synchronization mechanisms that require a considerable number of fast threads waiting for the slow threads to complete. And yet another proposal improves task-level parallelism by executing dynamically forked tasks in an out-of-order fashion with cores behaving as functional units. Disadvantageously, the dependency information between tasks is explicitly defined by the programmer in a special programming model and complex hardware is required to handle decoding, building task-graph and scheduling tasks to the cores.
Thus, what is needed is a plurality of simple cores in a reconfigurable multi-core architecture that can dynamically adapt itself to support both multi-threaded code with explicit thread-level parallelism as well as sequential code with instruction-level parallelism. Furthermore, other desirable features and characteristics will become apparent from the subsequent detailed description and the appended claims, taken in conjunction with the accompanying drawings and this background of the disclosure.