Processor performance is highly dependent upon the memory bandwidth that is available for accessing data to be processed. For example, the number of concurrent operations the processor is able to support is directly related to the availability of data accessible through the processor's memory interface. Providing adequate memory bandwidth at a low cost is a difficult problem that must be addressed in the design of any new processor and is typically a very difficult problem to address in the design of higher performing next generations of existing processors.
In a load/store register file centric processor there is typically a fixed register file (RF) capacity, fixed number of RF ports, and maximum memory bandwidth available for use by the instruction set. For example, consider a processor with a fixed RF capacity of 16×64-bits and which needs to support a maximum of 64-bit load operations from memory and 64-bit store operations from memory concurrently every cycle from the local data memories. To accomplish 64-bit load and store capability, a dedicated 64-bit load port and a dedicated 64-bit store port is required on the RF and an architecture that allows concurrent load and store instruction issuing. To extend beyond this capability in the same architectural manner would require increasing the number of register file ports or increasing the data width, both approaches requiring corresponding architectural changes. In addition, these changes can be counterproductive. For example, increasing the number of register file ports in a single RF as part of an attempt to resolve the memory bandwidth limit, increases the implementation size of the register file and tends to slow it down.
With the advent of low power processors for mobile battery operated devices, increasing the processor's memory bandwidth by increasing the processor clock rate is at odds with lowering the device's power requirements, because the device's power use can be highly dependent upon the processor's operating frequency. In addition, with an existing processor architecture, there usually is a significant amount of code and tools that have been developed, such that making changes to the processor's architecture for the purposes of improving performance and reducing power use becomes increasingly difficult.