Field of the Invention
This invention is related to the field of processors and, more particularly, to processors that execute predicated vector operations.
Description of the Related Art
Recent advances in processor design have led to the development of a number of different processor architectures. For example, processor designers have created superscalar processors that exploit instruction-level parallelism (ILP), multi-core processors that exploit thread-level parallelism (TLP), and vector processors that exploit data-level parallelism (DLP). Each of these processor architectures has unique advantages and disadvantages which have either encouraged or hampered the widespread adoption of the architecture. For example, because ILP processors can often operate on existing program code, these processors have achieved widespread adoption. However, TLP and DLP processors typically require applications to be manually re-coded to gain the benefit of the parallelism that they offer, a process that requires extensive effort. Consequently, TLP and DLP processors have not gained widespread adoption for general-purpose applications.
Vector memory operations can be used to read/write vector data to/from vector registers in a DLP processor. Particularly, one DLP architecture permits vector elements to be stored in non-consecutive memory locations (i.e. non-consecutive addresses). In such an architecture, vector reads can gather the vector elements from dispersed memory locations into a vector register, and vector writes can disperse the vector elements from the vector register to disparate memory locations. Vector reads are generated responsive to vector load instructions, and vector writes are generated responsive to vector store instructions, respectively.
Supporting the above vector loads and stores can simplify the transition to vector code, since data need not be moved from its original locations to be vectorized. However, the above vector loads and stores can present challenges to efficient instruction scheduling and execution. Generally, issue circuitry attempts to schedule a given dependent operation based on the completion of the previous operation on which the given dependent operation depends. That is, the given dependent operation is scheduled to arrive at a pipeline stage at which operands are forwarded at the same time that the result of the previous operation is forwarded. For non-vector loads and stores, the number of cache access, translation lookaside buffer (TLB) accesses, etc. are known and thus the time at which the operation will complete is known. However, a variable number of cache accesses and/or translations may be used to execute a given vector load or store. Thus, the completion time is unknown at the time the given vector load/store is issued.