The present invention relates to register files; more specifically, a system and method for improving performance of a central processing unit.
One key aspect of throughput-oriented applications is the presence of abundant parallelism. To exploit it, many-core, big chip designs strive to provide a memory sub-system that can constantly feed the cores with data to keep them busy. The memory bandwidth pressure is further exacerbated in the presence of SIMD execution modes, as more bytes per cycle are required for continuous operation.
Recent work has focused on optimizing bandwidth in Chip Multiprocessor (CMP) architectures in order to keep cores well utilized. In some of these works, the emphasis was on the off-chip memory interface and associated bandwidth partitioning or management ideas. More recent throughput-oriented designs have focused on optimizing the on-chip cache hierarchy, with special attention to the last-level cache. There are also architectures that incorporate scratchpad memories close to the cores. Regardless of the adopted strategy, all those approaches are intended to keep large amounts of data as close as possible to the processing units.