1. Field of the Invention
The present disclosure generally relates to processor registers, and more specifically to methods and apparatus for source operand collector caching.
2. Description of the Related Art
Parallel processors have multiple independent cores that enable multiple threads to be executed simultaneously using different hardware resources. SIMD (single instruction, multiple data) architecture processors execute the same instruction on each of the multiple cores where each core executes on different input data. MIMD (multiple instruction, multiple data) architecture processors execute different instructions on different cores with different input data supplied to each core. Parallel processors may also be multi-threaded, which enables two or more threads to execute substantially simultaneously using the resources of a single processing core (i.e., the different threads are executed on the core during different clock cycles).
When a processor schedules an instruction for execution by a processor core, the processor writes certain values into special registers in a register file coupled to the processor core. One register may store the opcode that specifies the operation to be performed by the processor core and additional registers may store operand values used as input to the processor core for executing the instruction. In order for an operation to be executed, each of the values must be written into the register file and then coupled to the inputs of the datapath via a crossbar or other data transmission means. Each instruction may require new registers in the register file to be connected to the inputs at the top of the datapath.
One problem with the above architectures is that configuring the crossbar to couple register values stored in the register file to the inputs at the top of the datapath requires one or more clock cycles to perform. The time required to load each operand introduces latencies into the overall processing efficiency. Furthermore, the crossbar may be configured so that only one operand may be coupled to the inputs of the datapath during each clock cycle.
Accordingly, what is needed in the art is an improved technique for loading values from the register file into the inputs of a datapath of a processor core.