Field of the Invention
The present disclosure generally relates to processor register files and, more specifically, to methods and apparatus for shaped register file reads.
Description of the Related Art
Parallel processors have multiple independent cores that enable multiple threads to be executed simultaneously using different hardware resources. SIMD (single instruction, multiple data) architecture processors execute the same instruction on each of the multiple cores where each core executes on different input data. MIMD (multiple instruction, multiple data) architecture processors execute different instructions on different cores with different input data supplied to each core. Parallel processors may also be multi-threaded, which enables two or more threads to execute substantially simultaneously using the resources of a single processing core (i.e., the different threads are executed on the core during different clock cycles).
When a processor schedules an instruction for execution by a processor core, the processor writes certain values into special registers in a register file coupled to the processor core. One register may store the opcode that specifies the operation to be performed by the processor core and additional registers may store operand values used as input to the processor core for executing the instruction. In order for an operation to be executed, each of the values must be written into the register file and then coupled to the inputs of the datapath via a crossbar or other data transmission means.
Oftentimes, an instruction for a thread refers to 32-bit, 64-bit or even 128-bit operands that are to be read from the register file. Conventional register files, however—which typically include a plurality of 32-bit slots—require the processor to transpose multiple 32-bit values read from the 32-bit slots into the 64-bit or 128-bit values requested by the thread, which requires several clock cycles to complete. One solution to this problem involves simply implementing register files that include larger slots, e.g., 64-bit slots. Unfortunately, such register files come at a much higher cost and increase the overall complexity of the processor in which they are included.
Accordingly, what is needed in the art is an improved technique for handling variable-sized data reads of a register file.