The present invention relates to a memory control technology and a data processing technology, and more particularly to a DMA (direct memory access) control technology and a data processing technology for an SIMD (single instruction multiple data) processor.
For the purpose of increasing a data processing speed, a semiconductor integrated circuit having a plurality of arithmetic processing mechanisms starts to be used. Japanese Unexamined Patent Application Publication No. 2010-277429 discloses an SIMD processor in which a plurality of processing elements, which are processing modules, are connected by a one-way ring bus which is an annular communication channel.
Each of the processing elements configuring the SIMD processor has an internal memory, and data stored in an external memory transferred to the internal memory under a data transfer control by a DMA (direct memory access) device. The DMA device includes an address generator circuit, and generates read addresses while incrementing the addresses one by one, and data is read in the order of the addresses of the external memory, and then stored in the internal memory. The DMA device conducts column transfer for supplying data of one unit to all of the processing elements by a predesignated number of columns, and stores two-dimensional data in the internal memory. The processing elements execute given arithmetic processing on the data stored in the internal memory under a control from a control processor connected to the ring bus.
In order to conduct matrix arithmetic processing necessary for image data processing by the aid of the SIMD processor, there is a need to conduct data marshaling prior to the processing. This makes it necessary that after data has been transferred to the internal memory of each processing element from the external memory, the data is transferred to another processing element by execution of a data marshaling instruction to conduct data realignment.
However, because there is a need to transfer a large number of data among the respective processing elements at the time of initially aligning the data, a delay occurs due to data transfer among the processing elements. The delay is problematic in an improvement of the performance by parallelizing computing.
Under the circumstances, there has been proposed a technique by which the data alignment is changed when data is read into the processing elements from the external memory. For example, Japanese Unexamined Patent Application Publication No. 2005-309499 discloses a technique by which data is marshaled by supplying read addresses to a plurality of memory banks, individually, to take data from the memory banks. Also, Japanese Unexamined Patent Application Publication No. 2010-170164 discloses a technique by which data read from the external memory is temporarily stored in a buffer, and data sequence is realigned by values in a plurality of tables.