Advances in semiconductor processing and logic design have permitted an increase in the amount of logic that may be present on integrated circuit devices. As a result, computer system configurations have evolved from a single or multiple integrated circuits in a system to multiple hardware threads, multiple cores, multiple devices, and/or complete systems on individual integrated circuits. Additionally, as the density of integrated circuits has grown, the power requirements for computing systems (from embedded systems to servers) have also escalated. Furthermore, software inefficiencies, and its requirements of hardware, have also caused an increase in computing device energy consumption. In fact, some studies indicate that computing devices consume a sizeable percentage of the entire electricity supply for a country, such as the United States of America. As a result, there is a vital need for energy efficiency and conservation associated with integrated circuits. These needs will increase as servers, desktop computers, notebooks, Ultrabooks™, tablets, mobile phones, processors, embedded systems, etc. become even more prevalent (from inclusion in the typical computer, automobiles, and televisions to biotechnology).
Modern processors are capable of executing instructions of multiple instruction sets. For example, the Intel™ 64-bit instruction set provides multiple vector interfaces to support vector instructions of varying width. From earlier instruction set architecture (ISA) extensions such as so-called Streaming SIMD Extensions (SSE) to current Advanced Vector Extensions (AVX-512), vector width has been increasing, in some cases from 4 to 16 single-precision floating-point numbers, with processors having different vector interfaces to handle these widths. This width variance makes it difficult for a programmer or compiler to choose which interface to use, since the input data distribution (e.g., for a sparse matrix) or the number of loop iterations (e.g., for an auto-vectorized loop) may not be known in advance to enable determination of the most efficient vector width. As a result, poor vector utilization (e.g., wasted power/energy) and/or performance impacts are common problems.