Vector processors are designed to operate simultaneously on a collection of data items that are arranged in a “vector” having a specific vector length (VL). A vector processor typically relies on internal data path that may or may not have the same width as the vector length. Recently 256 bit (“b”) data width processors have been designed, replacing 128 b systems. In such processors, the execution data path may not match a maximum vector length (VL) (e.g., 256 b path for a maximum VL of 512 b).
To perform the operation for a full vector length instruction (VSSE), the instruction may be broken into a set of operations working on subsets of the data inputs. For instance, a VSSE instruction for a vector length of 512 b may be decoded into two micro operations (μops) when fetched by a microprocessor, each μop being able to operate on 256 b of data.
However, all VSSE operations may not be performed on the full 512 b vector length. When the vector length is not equal to the max VL, a suitably smaller set of operations will be executed. Deciding how many micro operations will be executed is performed by an instruction decoder within the processor.