The present invention relates to an image processing device, and more particularly, to a technique for controlling requests for access to a memory from a plurality of cores.
Prior art on the above technique will be described in the context of MPEG2 video encoding as an example.
In MPEG2 video encoding, an entire image is divided into units called macro-blocks each consisting of 16 pixels vertical xc3x9716 pixels horizontal in a lattice shape, and image signals are encoded every macro-block. Encoding of one macro-block is independent from that of others.
In such image encoding, pipeline processing per macro-block is employed for improving the throughput. In order to perform the pipeline processing, an MPEG2 video encoding system incorporates therein exclusive arithmetic elements for performing operations required for the encoding, such as motion vector detection, DCT operation, and quantization operation. Hereinafter, such exclusive arithmetic elements are referred to as xe2x80x9ccoresxe2x80x9d.
Herein, also, the processing unit time of the pipeline processing is referred to as a xe2x80x9cmacro-block periodxe2x80x9d, and processing units obtained by sectioning the pipeline processing every macro-block period are referred to as xe2x80x9cmacro-block stagesxe2x80x9d or simply xe2x80x9cstagesxe2x80x9d.
At a given stage, the cores perform respective types of processing for different macro-blocks in parallel. During such processing, the cores send requests for access to a memory independently. Therefore, in order to ensure normal execution of MPEG2 video encoding, it is necessary to provide a mechanism of exclusively controlling such requests for access to a memory from the plurality of cores at each stage.
FIG. 12 is a view showing a method of controlling memory access employed by a conventional MPEG2 video encoding system (M. Mizuno et al., xe2x80x9cA 1.5W Single-Chip MPEG2 MP@ML Encoder with Low-Power Motion Estimation and Clockingxe2x80x9d, 1997, ISSCC; and O. Ohnishi et al., xe2x80x9cMemory Architecture in a 1-Chip MPEG-2 MP@ML Video Encoder LSIxe2x80x9d, The 1997 IEICE General Conference). In FIG. 12, the y-axis represents the types of cores of the conventional MPEG2 video encoding system and the x-axis represents the time. FIG. 12 shows when, i.e., how many cycles after the start of a stage, data transfer for each core should be started. That is, it shows a schedule of memory access of the cores in one macro-block period.
In the above conventional example, a plurality of access requests are exclusively controlled by fixedly scheduling when and which types of cores are allowed to access to an external memory in one macro-block period. More specifically, a schedule is prepared in advance in such a manner that, for example, data is to be written into a memory from a video input section a given number of cycles after the start of a stage and read from the memory to a motion vector detection section a given number of cycles after the start of the stage.
In the above conventional example, memory access at each stage is fixedly scheduled as described above. In order to ensure normal execution of pipeline processing, therefore, it is necessary to set the number of cycles per stage in consideration of the case where the number of cycles required for data transfer for each core is maximum, i.e., in consideration of the worst case.
The inventors of the present invention have examined the conventional example and found that, if the number of cycles per stage is set based on the worst case, it exceeds the upper limit that allows for normal execution of pipeline processing (estimated based on the specification on the operating frequency in the conventional example).
In order to solve the above problem, a conventional encoding system is provided with a cache memory for motion vector detection. Reference image data used during the first search is stored in this cache memory, to be used again during the second search. This omits data transfer from an external memory during the second search and thus reduces the number of cycles required for the entire memory access (Y. Ooi et al, xe2x80x9cDevelopment of MPEG2 MP@ML-based Single-Chip Encoding LSIxe2x80x9d, April 1997, Nikkei Electronics).
However, with such a cache memory provided to reduce the number of cycles for memory access as described above, the power consumption and area of the entire system increase by those of the cache memory itself. In particular, if the system is implemented as an LSI, the increase in power consumption and area is critical.
The object of the present invention is to attain more efficient memory access than conventionally attained in an image processing device having a plurality of cores. In particular, in an MPEG2 video encoding system, it is ensured that data transfer required for encoding can be executed at the same operating frequency as that used conventionally without providing a cache memory.
Specifically, the present invention is directed to an image processing device for processing an image signal by pipeline processing using an external memory, the device comprising: a plurality of cores each performing operation for image processing; and a memory access section for executing data transfer between the plurality of cores and the external memory, wherein the memory access section includes an access schedule storage portion for storing a type of data transfer for each stage which is a unit of the pipeline processing, and the data transfer between the plurality of cores and the external memory is executed in accordance with storage contents of the access schedule storage portion, and the access schedule storage portion is constructed so that the type of data transfer required at each stage can be set at a stage preceding each stage.
According to the present invention, since the type of data transfer required at each stage can be set at a preceding stage, it is possible to change the type of data transfer with stages flexibly. This allows the memory access section to execute only necessary type of data transfer at each stage without the necessity of arbitration. In this way, efficient memory access is realized.
Preferably, the image processing device according to the present invention further includes a system control section for controlling the plurality of cores and the memory access section, wherein while the system control section instructs the memory access section to execute data transfer, the system control section sets the type of data transfer at a stage subsequent to the present stage in the access schedule storage portion.
In the image processing device, preferably, the memory access section outputs a stage transfer state signal indicating whether or not data transfer has been terminated at each stage, and the system control section instructs the memory access section to execute data transfer at the next stage when the stage transfer state signal indicates termination of data transfer.
The memory access section of the image processing device according to the present invention preferably includes: an interface portion for executing data transfer of a designated type and outputting a local transfer state signal indicating whether or not the data transfer has been terminated; and an access control portion for designating a type of data transfer to activate the interface portion and newly designating a type of data transfer when the local transfer state signal indicates termination of data transfer to activate the interface portion.
The image processing device according to the present invention preferably performs encoding as the processing of an image signal.