Computer memory devices typically arrange stored data in a two-dimensional matrix comprising rows and columns of data. For example, referring to FIG. 1, memory device 110 comprises a 4×4 matrix. In FIG. 1, rows of data are indicated by letters (e.g., a, b, c, and d) and columns of data are indicated by numbers (e.g., 0, 1, 2, and 3). To access a particular instance of data, an address translation table is accessed to determine the row and column locations of that particular instance of data (e.g., row b, column 1).
In many media processing applications (e.g., video processing), transpose operations are performed which necessitate accessing a two-dimensional array of data in both dimensions (e.g., horizontally and vertically). For example, an operation may be performed upon the rows of data, the result is stored in the memory device, and a second operation is performed upon the stored columns of data. However, memory devices and the processors which access them are not typically configured to access the data in two dimensions. In other words, the data can be easily accessed as rows of data, or columns of data. However, accessing the data as both rows and columns with equal efficiency is difficult to implement.
In one conventional solution to accessing the data in two dimensions, a scalar processor is used to access the data. As a result, each instance of data in a row or column of data is accessed sequentially. Thus, in the example of FIG. 1, to access row a, four sequential read operations are performed. However, to access column 1 of the data stored in memory device 110, each row of data must be read. Then, a filtering algorithm is used to select the instances of data which are wanted (e.g., a1, b1, c1, and d1). As a result, 16 read operations are performed to retrieve the 4 instances of data which are actually wanted. Then, the filtering algorithm selects the 4 instances of data which are wanted and discards the others. As a result, additional time is used to retrieve the unwanted data and to perform the filtering algorithm itself.
Another solution would be to use 4 ports (e.g., one per row, or one per column) to speed data retrieval. This is advantageous because each port could be used to read an instance of data from its respective column. However, additional ports consume excessive amounts of space on the memory device and, therefore, are not a viable alternative for most memory devices.
Another solution would be to access the data in parallel. However, most parallel architectures are forced to avoid algorithms which use both rows and columns and typically access the data in rows only.
Thus, conventional methods for accessing data in two dimensions cannot efficiently access the data as either a row or a column of data.