The present invention relates generally to computer memory, and more particularly to predication in a vector processor.
Computer systems often require a considerable amount of high speed memory, such as random access memory (RAM), to hold information, such as data and programs, when a computer is powered and operational. Memory system demands have continued to grow as computer systems have increased performance and complexity.
Communication from a main processor to locations on memory devices can involve relatively long data access times and latency. The time it takes for the main processor to access memory can be, for example, several hundred cycles, including time to realize the data is not in cache (for memory reads), time to traverse from a processor core of the main processor to I/O, across a module or other packaging, arbitration time to establish a channel to memory in a multi-processor/shared memory system, and time to get the data into or out of a memory cell.
A vector processor may support multiple memory accesses in parallel. Supporting parallel memory accesses to multiple memory locations can increase bandwidth but also increases power consumption. The increased bandwidth may come at a cost of reduced efficiency, particularly where data accessed at one or more of the memory locations is not used in further processing.