Many new applications being planned for mobile devices (multimedia, graphics, image compression/decompression, etc.) involve a high percentage of vector computations. One limitation on the computation rate of these applications is the speed of accessing vector or matrix data stored in memory.
One approach to accessing vector data is to specify the starting address in memory of the data, the size of each data element (in bits) and the separation between consecutive data elements (the “stride”). This approach allows sequential data to be accessed, but cannot be used where the elements are not separated by a constant amount. So, for example, the approach cannot be used if parts of a data vector are stored in different memory partitions. For example, a two-dimensional image may be stored in consecutive memory locations, one row at a time. The memory addresses of a data vector representing a sub-block are not separated by an equal amount.
A further approach, which has application to the processing of sparse data matrices, is to generate vectors specifying the locations of the non-zero matrix elements in memory. While this method provides the flexibility required for specialized Finite Element calculations, it is more complex than required for most multimedia applications on portable devices.
A still further approach uses L1 and L2 memory caches to speed memory access. The data is pre-fetched in blocks defining the starting address, block size, block count, stride and stride modifier. The stride modifier allows diagonal elements of a data matrix to be accessed. However, the approach cannot be used unless the data elements are separated by a constant amount. Further, the approach does not allow for data access to start part way through a block without modifying the block structure.