The present invention relates generally to an improved computer processing instruction set, and more particularly to an instruction set having a multiply-accumulate functionality.
Computer architecture designers are constantly trying to increase the speed and efficiency of computer processors. For example, computer architecture designers have attempted to increase processing speeds by increasing clock speeds and attempting latency hiding techniques, such as data prefetching and cache memories. In addition, other techniques, such as instruction-level parallelism using VLIW, multiple-issue superscalar, speculative execution, scoreboarding, and pipelining are used to further enhance performance and increase the number of instructions issued per clock cycle (IPC).
Architectures that attain their performance through instruction-level parallelism seem to be the growing trend in the computer architecture field. Examples of architectures utilizing instruction-level parallelism include single instruction multiple data (SIMD) architecture, multiple instruction multiple data (MIMD) architecture, vector or array processing, and very long instruction word (VLIW) techniques. Of these, VLIW appears to be the most suitable for general purpose computing. However, there is a need to further improve architectures to increase efficiency.
Some instruction sets in recent microprocessor designs include a multiply-accumulate instruction. Combining the multiply functionality with an accumulate or sum function provides efficiencies because two operations are combined into one. Multiply-accumulate instructions allow performing video manipulations such as fades and alpha-blends more efficiently. However, more efficient multiply-accumulate functions are needed to more efficiently process video.
With certain arithmetic operations, the result can be too large to fit in the output register. For example, multiplication of two sixteen bit values potentially produces a thirty-two bit result. Attempting to put the result in a sixteen bit output register would cause an overrun. Although, the lower sixteen bits of a result could be put into the output register and it would appear to be a smaller result without the benefit of the high-order bits. Accordingly, improved methods are needed to represent overrun situations.