In prior systems, a processor was typically required to support a restricted set of data types in the execution units and storage units that supplied the execution units with data. The restriction of the data types represented a compromise between an application's real needs, performance requirements, and implementation costs. If requirements for a processor dictated more optimal support for both single width and packed data type operations, there arose a number of problems that had to be resolved in the design and implementation of the processor.
One approach to using a single register file design to support both single width and packed data types employs multiple read and write ports, where each port is of the lowest data type granularity required for single width accesses. For example, consider a processor required to support a 32 bit operand single width access and a dual 32 bit packed data access from a local storage unit, such as a register file. In this example, each register file access port is fixed to support 32 bit accesses, thereby optimizing operations for single width accesses. In order to support the reading of two 32 bit packed data elements in a single execution unit, two 32 bit data read ports are implemented in the single register file. This approach leads to increased complexity. For example, in a typical execution operation, such as a dual 32 bit packed data add operation in an arithmetic logic unit (ALU) requiring two source operands and one target operand, four read ports would be required in the register file to meet the source operand requirement and two write ports would be required for the target operand. Thus, a total of six 32 bit access ports would be required for this example.
The implementation of multiple read and write ports grows in complexity when considering (VLIW) architectures. In a VLIW processor using a single register file design to support multiple execution units, the access requirement for the single register file grows by multiples according to the number of instructions in the VLIW. For example, with three execution units, similar to the previously described ALU, each with four read and two write ports, the single register file would be required to support twelve read ports and six write ports. A total of eighteen access ports would be needed. The total number of access ports increases when considering possible load and store ports which may be required for independent operation. Placing these additional access ports in a single register file increases the complexity and implementation costs of the register file design and can severely affect the register file's performance, and thereby limit the overall processor performance.
Attempting to resolve such complexity and performance problems by supporting only the larger packed data types of the architecture in the register file and supporting smaller data type accesses external to the register file, leads to inefficient use of the register file and potential increases in power, due to the inefficiency. For example, with a register file having only 64 bit read access ports, the ability to support dual 32 bit packed data accesses is easily accomplished, but the support for single width 32 bit accesses becomes complicated. A single width 32 bit access can be accomplished external to the register file but at the expense of always reading 64 bit s when only single 32 bit s width accesses are required. For single 32 bit accesses, only 32 bit s of the 64 bit s accessed are used with the consequent increase in power associated with the unused access.
When executing algorithms, it is desirable to have a storage file that can be organized to more advantageously support processing of the varying data types and formats that dynamically occur in a programming application. For example, a register file of large width for high precision operations can be required in one part of an application while single and multiple parallel operations on lower precision data can be required in a different part of the same application. Flexibly satisfying these diverse data type requirements may result in high hardware costs, such as those typically associated with implementing a wider register file and additional read and write ports. The general problem concerns the optimization of a programmer's storage facility, such as a register file, for use in accessing diverse data types in a manner that dynamically is usable without setup overhead, improves processor performance, is cost efficient to implement, and is scalable.