Conventionally, in a baseband processing of radio communication and the like, a large amount of matrix calculation process may be required. When subjecting a large amount of data to the same matrix calculation process, a stream arithmetic processing unit (APU) may be suitably used to consecutively read a series of data from a memory, and write a series of calculation results to consecutive addresses of the memory.
A two-dimensional array transposition circuit related to the matrix and the memory is known. In the two-dimensional array transposition circuit, an address conversion circuit may perform a first control operation to specify a row address by lower bits of an address signal and to specify a column address by upper bits of the address signal. In addition, the address conversion circuit may perform a second control operation to specify the row address by upper bits of the address signal, and to specify the column address by lower bits of the address signal. The two-dimensional array transposition circuit performs the control operation of either one of the first control operation and the second control operation, in order to perform a transpose and read of two-dimensional array data written in the memory.
An example of the two-dimensional array transposition circuit is proposed in a Japanese Laid-Open Patent Publication No. 10-207868, for example.
In the case of a processor that performs a stream process by accessing a large amount of data at consecutive addresses of the memory, a processing performance may greatly deteriorate when the data are not arranged at consecutive addresses.
However, in the case of a processor that performs a 2×2 matrix calculation and a 4×4 matrix calculation, for example, a result of the 2×2 matrix calculation may be stored in the memory with a format suited for processes targeting the 2×2 matrix calculation. In this case, when the result of the 2×2 matrix calculation is to be used for processes targeting the 4×4 matrix calculation, the processes targeting the 4×4 matrix calculation may be difficult to perform efficiently because the result of the 2×2 matrix calculation may be unsuited for the 4×4 matrix calculation. Similarly, when the result of the 4×4 matrix calculation is to be used for processes targeting the 2×2 matrix calculation, the processes targeting the 2×2 matrix calculation may be difficult to perform efficiently because the result of the 4×4 matrix calculation may be unsuited for the 2×2 matrix calculation.
The conventional two-dimensional array transposition circuit described above may simply perform the transpose and read of the two-dimensional array data written in the memory. For this reason, the arrangement of the data with formats suited for the matrix calculations of different matrix sizes may be difficult to achieve.