Central processing units (CPUs) typically include at least one execution data path having execution units and data registers. Data may be stored in the data registers, and the execution units perform arithmetic computations (e.g., add/subtract/compare) on the data. In an exemplary operation, data is issued from the data registers to the execution units for arithmetic computations, and the results of the arithmetic computations are returned to the data registers. These operations incur little delay in short execution data paths. However, CPUs are low commercially available which implement superscalar execution data paths.
Superscalar execution data paths enable more than one instruction to be executed for each clock cycle, thereby increasing throughput. However, the size of superscalar execution data paths increases the physical distance between at least some of the execution units and data registers. This physical distance may result in processing delays, e.g., as data and computation results are issued between the execution units and data registers. In addition, data may become corrupt during transfer between the execution units and data registers.