Typical microprocessors include an execution unit, storage for data and instructions, and an arithmetic unit for performing mathematical operations. Much of the microprocessor development over the past two decades has been in speeding the operating clock and widening the operational datapath. Specialized techniques such as predictive branching and deeper staged execution pipelines have also added performance at the cost of increased complexity.
One emerging idea to gain even more performance from processors is to include multiple “execution cores” within a single microprocessor. These new processors include on the order of 2-8 processors, each of which operates simultaneously and in parallel. Although multi-core processors seem to have higher composite performance than single-core processors, the amount of additional overhead to ensure that each processor operates efficiently dramatically increases with each additional core. For instance, memory bottlenecks and synchronization must be explicitly managed in multi-core systems, which adds overhead in design and operation. Because the increased complexity in having multiple cores increases as more cores are added, it is doubtful that gains from adding additional execution cores into a singe microprocessor can continue before the gains diminish substantially.
Newer microprocessor designs include arrays of processors, on the order of tens to thousands implemented on a single integrated circuit and connected to one another through a compute fabric. Such a processor array is described in the above-referenced '036 application. Programming or configuring such a system is difficult to synchronize startup and time consuming because of the huge amount of state needed to set up a large number of processors. Reconfiguring such a system when running is extremely difficult because the exact state of each is difficult or impossible to predict.
Embodiments of the invention address and other limitations in the prior art.