The invention relates generally to computer architectures. More particularly, the invention relates to a computer architecture to process matrix instructions specifying parallel and dependent operations.
Improving computer architecture performance is a difficult task. Improvements have been sought through frequency scaling, Single Instruction Multiple Data (SIMD), Very Long Instruction Word (VLIW), multi-threading and multiple processor techniques. These approaches mainly target improvements in the throughput of program execution. Many of the techniques require software to explicitly unveil parallelism. In contrast, frequency scaling improves both throughput and latency without requiring software explicit annotation of parallelism. Recently, frequency scaling hit a power wall so improvements through frequency scaling are difficult. Thus, it is difficult to increase throughput unless massive explicit software parallelization is expressed.
In view of the foregoing, it would be desirable to improve computer architecture performance without reliance upon frequency scaling and massive explicit software parallelization.