A multi-core microprocessor is one that combines two or more independent (micro)processors or processing units—called cores—into a single package, often a single integrated circuit (IC). A core or a single processor includes a CPU (Central Processing Units) and sufficient associated memory units to be able to independently execute a program or thread—viz., registers, TLB, Level-1 (L2) instruction and data caches, additional L2 caches, etc. For example, a dual-core device may include two independent microprocessors and a quad-core device may include four microprocessors. A multi-core microprocessor may implement multiprocessing in a single physical package. Cores in a multi-core device may share a single coherent cache and/or may have private (separate) caches. The processor cores may share the same interconnect to the rest of the system and to each other. Each “core” (a single microprocessor) may independently implement optimizations such as pipelining, superscalar execution, simultaneous multi-threading (SMT), multi-programming, etc. A multicore processor system with N cores may be more effective when it is presented with N or more threads concurrently, so as to keep each core busy with work.
Multi-core processors may pose a substantial performance challenge to sequential programs because sequential programs cannot typically utilize the multiple cores and may be restricted to executing on a single core. For certain workload classes, this limitation may also result in wasted hardware when there are not enough tasks (e.g., other sequential or parallel programs) to execute on the other cores or when the on-chip shared cache is unable to sustain other tasks.
Non object-oriented programs, especially legacy programs such as C programs, may have less modular data organization than object-oriented programs with respect to their computation structures (such as procedures) and data access patterns. However, these programs may also go through multiple phases of repetitive data access patterns during execution, typically resulting in local cache misses across the phase transitions.
The present disclosure appreciates the challenges in executing a sequential program on a computing device with a multi-core processor.