Field of the Invention
This invention relates to computing systems, and more particularly, to efficiently accessing operands in a processing datapath.
Description of the Relevant Art
Compilers may extract parallelized tasks from program code to execute in parallel on the system hardware. The parallelization of tasks within the program code is used to increase the throughput of computer systems. Computational intensive parallel tasks may include cryptography, video graphics rendering and garbage collection. Particular instructions for these tasks may use a disproportionate share of a shared resource, which delays a deallocation of the shared resource. Due to the inefficient usage of the shared resources, both single-core and multi-core general-purpose processors may not efficiently execute these parallel tasks. To overcome the performance limitations of conventional general-purpose cores, a computer system may utilize one or more cores with a single instruction multiple data (SIMD) parallel micro-architecture. Typically, general-purpose processors are designed to exploit parallelism in an instruction stream. The SIMD cores are typically designed to exploit parallelism in a data stream.
Specialized processor cores that utilize a SIMD parallel micro-architecture include digital signal processors (DSPs), graphics processing units (GPUs), and so forth. These SIMD cores may be found in video game consoles, smart phones, audio/video A/V) editing workstations, portable tablet computers, portable media players, and so forth. A vital issue for modern integrated circuits (ICs) within portable computers, mobile communication devices, and desktop systems is power consumption. As power consumption increases, more costly cooling systems are utilized to remove excess heat. These cooling systems may include larger heat sinks and operation mode control logic, which increase design complexity and system cost. In addition, battery life for devices is reduced as energy consumption increases.
A parallel data path within a SIMD core may be pipelined. Ideally, every clock cycle produces useful execution of an instruction for each stage of the pipeline. In order to increase the probability of useful execution for each stage, a SIMD core may interleave instructions from different software threads within the pipeline. Parallel tasks as listed above typically include several software threads that may be scheduled on a SIMD core. Data operands for the multiple threads are located in an instruction stream register file, which may also be referred to as an operand storage area. This operand storage area is sufficiently large that a random access memory (RAM) is used to store the data, rather than registers. The RAM may be banked to provide pseudo-multi-porting that allows multiple read and write operations to occur concurrently. The large RAM and associated banking logic increases the energy consumed to perform read and write operations for data operands. In addition, when any two operations conflict for a bank, a stall in the pipeline results from resolving the conflict.
In view of the above, efficient methods and mechanisms for accessing operands in a processor are desired.