Field
Embodiments relate to processors. In particular, embodiments relate to processors having multiple cores.
Background Information
FIG. 1 is a block diagram of a prior art processor 100. The processor has multiple cores 101. In particular, the illustrated processor has a core 0 101-0, a core 1 101-1, through a core M 101-M. By way of example, there may be two, four, seven, ten, sixteen, or any other appropriate number of cores. Each of the cores includes corresponding Single Instruction Multiple Data (SIMD) execution logic 102. In particular, core 0 includes SIMD execution logic 102-0, core 1 includes SIMD execution logic 102-1, and core M includes SIMD execution logic 102-M. That is, the SIMD execution logic is replicated per-core. Each SIMD execution logic is operable to process SIMD, vector, or packed data operands. Each of the operands may have multiple smaller data elements, such as 8-bit, 16-bit, 32-bit, or 64-bit data elements, which are packed together in the operands and processed in parallel by the SIMD execution logic.
In some processors, each of the SIMD execution logic may represent a relatively large amount of logic. For example, this may be the case when each of the SIMD execution logic is to process wide SIMD operands. Some processors are able to process vector or packed data operands having relatively wide widths, such as, for example, 128-bit operands, 256-bit operands, 512-bit operands, 1024-bit operands, or the like. Commonly, the SIMD execution logic needed to process such wide operands tends to be relatively large, to consume a relatively large amount of die area, to increase the cost of manufacturing the processor, and to consume a relatively large amount of power during use. Replicating the relatively large SIMD execution logic per-core tends to exacerbate such problems. Moreover, in many applications or workload scenarios, the replicated SIMD execution logic per-core tends to be underutilized at least some of the time. If the number of cores continues to increase in the future, such problems may become even more significant.
Still further, in the prior art processor of FIG. 1, each of the cores also has conventional flow control logic. In particular, core 0 has flow control logic 103-0, core 1 has flow control logic 103-1, and core M has flow control logic 103-M. Commonly, the flow control logic may be designed or optimized to cover a wide range of usage models, for example, introducing speculative execution. However, this generally tends to have a relatively small benefit for SIMD and various other high throughput computations, but tends to be accompanied by relatively high power consumption.