Power consumption is a major concern in modern electronics and processor-based devices. In recent years, the use of laptop and notebook computers for mobile computing has become commonplace. Also, mobile devices are becoming a standard accessory for the busy professional. Moreover, many of today's mobile devices include functionality as a standard feature so that the user can access email, play computer games, or access the internet while on the move. In all of these examples, the electronic device relies on a finite power source, such as a battery. Thus, reducing power consumption, and thereby increasing battery life, is an important factor in fielding a product that is attractive to the market, and thus economically profitable.
One approach to reduce power consumption in a processor-based device is to reduce the power consumption of the processor. A common technique is to disable unused functional blocks within the processor based on the operation or set of operations being performed at a given time. Furthermore, the procedure for enabling and disabling functional blocks is inherently dynamic, as the blocks should be enabled so as not to introduce any processing latency, and yet be disabled quickly to minimize excess current use.
More advanced processors such as Intel's Advanced Vector Extensions (AVX) processors can process single instruction multiple data (SIMD) instructions in a vector manner. Some of the instructions perform operations on multiple data elements based on a mask operand. FIG. 1 is a block diagram illustrating vector execution rules. For example, instruction VADDPS ZMM1 {k1}, ZMM2, ZMM3 performs up to 16 single precision (SP) addition operations over 32-bit elements of data within 512-bit registers, all in parallel. The mask k1 is a 16-bit register whose every bit relates to every data element to be processed in the vector instruction. If the mask bit is one, the data element is processed and written into the destination. If the mask bit is zero, the processor either writes zero or leaves the destination unchanged (depending on the write masking mode). In hardware, all 16 elements are always operated inside the vector arithmetic logic unit (ALU) as a single entity. Unfortunately, there is a loss of power efficiency since a percentage of the operations performed inside vector ALU are going to be discarded since its related mask bit may be set to zero. FIG. 2 is pseudocode illustrating an example of mask driven AVX operations.