The present invention relates to an image processing device and a semiconductor device, and is, more particularly and preferably applicable to a motion image decoding process including a cache memory temporarily holding a reference image.
In systems handling motion images, the enlargement of the screen size progresses, such as in 4K or Super High Vision. To hold image data of decoded pictures (specifying the display screen of frames in the case of progressive scanning or fields in the case of interlace scanning), motion image-decoding devices handling a large volume of signals include a large capacity memory. In a process for motion prediction or motion compensation in the decoding process for the motion image, image data of a preceding or following picture ahead or after a target picture to be decoded is referred to as a reference image. Thus, a high bandwidth is necessary for accessing the memory, and the power consumption or high performance causes an increase in the cost. Thus, there is demanded a technique for reducing the bandwidth, and an important technique is to reduce the bandwidth using the cache memory for performing the process for decoding the motion image to have a high resolution.
Japanese Unexamined Patent Publication No. 1999-215509 discloses a technique for reducing data cache errors in a motion compensating process included in an MPEG (Motion Picture Element Group) video expansion process executed by the software on a general processor. Given to a data cache controller is an address of an area adjacent to the right side of a reference area, specified by a motion vector in the motion compensating process, in a particular macro block, and a preload instruction is issued. Then, data of the area is preloaded from the main memory to the data cache. Note that the macro block is a unit area including a plurality of pixels (for example, an area of 16 pixels*16 pixels) as a target for the decoding process. One picture includes a plurality of macro blocks which are two-dimensionally arranged in the row direction and the column direction. The decoding process is executed for target macro blocks, sequentially from an upper left macro block of the picture to the right macro block, and further from the macro block in the lower row sequentially from left to right. The area adjacent to the right side of the reference area, specified by the motion vector in the motion compensating process in a particular macro block, has a high possibility of being a reference area specified by a motion vector even in a motion compensating process in a target macro block for the next decoding process. Thus, by preloading image data of the area, it is possible to reduce the data cache error.
Japanese Unexamined Patent Publication No. 2010-146205 discloses a technique for improving a cache hit ratio in a cache memory storing image data. Data items of the top field and the bottom field of an interlace image are formed not to be mixed together in each cache line. In the case of an interlace image with a field configuration, decoding processes are independently performed for the top field and the bottom field. Thus, if two field data items are mixed together in each cache line, even when only either field data item is necessary, both field data items are read in the cache. This decreases the cache ratio. In each cache line, either of the top field and the bottom field is stored. This does not decrease the cache ratio. The number of ways of the cache and the number of entries are changed, in accordance with the change of the pixel area in the processing units, such as MBAFF (MacroBlock=Adaptive Frame/Field) in H.264 as one standard of motion image encoding. When access granularity for image data is high, the number of ways is reduced, and wide-range data of an image is held in the cache. When the access granularity is low, the number of ways is increased, and data of a narrow-range image is switched. As a result, the cache memory is sufficiently used, and the cache hit ratio is improved.
As a result of inventors' examination on Japanese Unexamined Patent Publications No. 1999-215509 and No. 2010-146205, the following new problems have been found.
According to the technique disclosed in Japanese Unexamined Patent Publication No. 1999-215509, for target macro blocks to be sequentially decoded, if a compensating process is executed using the motion vector with the same direction and the same size, it maximizes the effect of reducing the data cache error. However, as a result of the inventors' examination, it is found that the data preloaded to the data cache memory may not be referred, depending on the features of the target stream to be decoded. The stream may include inter macro blocks or intra macro blocks in every one picture. The inter macro block is a macro block for which a decoding process is performed with a motion compensating process, with reference to a reference image specified by the motion vector included in the stream. On the other hand, the intra macro block is a macro block for which a decoding process is performed with reference to decoded image data in the target picture to be decoded, without the motion compensating process. In an encoding process for generating a stream, the encoding efficiency may be improved by adaptively switching between performing inter-prediction with the motion compensation and performing intra-prediction without the motion compensation, in association with each macro block. In this case, the stream includes inter macro blocks and intra macro blocks in each one picture. When the target macro block to be decoded is an inter macro block, and after the image data of the reference area to be referred by the motion vector is read to the data cache memory, an address of the area adjacent to the right side thereof is given to the data cache controller to be necessarily preloaded. Even in this case, a target macro block to be decoded for the next time is processed, not necessarily with reference to the preloaded image data. When the target macro block to be decoded for the next time is an intra macro block, the reference area is not necessary, because the motion compensation is not performed. Further, the data cache memory is not accessed, thus possibly causing the preloaded image data to be wasted. Even if the macro block to be decoded next is an inter macro block, when the direction or size of the motion vector remarkably differs from that of the previous macro block, it is found that there is a high possibility that the preloaded image data will be wasted, with reference to a reference area different from the reference area of the previous macro block.
With adoption of the technique disclosed in Japanese Unexamined Patent Publication No. 2010-146205, the cache configuration (the number of ways and the number of entries) is simply and only changed, based on fixed information in the picture units. Specifically, the change is made in the top and the bottom at the interlace processing or the pixel area in the processing units. Thus, it does not manage the change in the features of the stream changing in accordance with each picture, thus not improving the cache efficiency. For example, in the case of a frame including many intra macro blocks, it is found that the data read in the cache memory by cache fill is not used, that is, the reusability is decreased, thereby the cache fill causes frequent occurrence of unnecessary data read.
Accordingly, if the preloading is executed evenly independently of the features of the stream, the preloaded data may not be referred. Hence, it is found that the band of a bus is wastefully used by the data read, due to the cache fill for the data cache memory. This unnecessary data read causes an increase in the consumption power. Like the general-purpose processor disclosed in Japanese Unexamined Patent Publication No. 1999-215509, when the bus is used commonly with another functional module, the above unnecessary data read creates pressure on the band for another module. In this case, it is found that there is a possibility of deteriorating the performance of the system as a whole. As disclosed in Japanese Unexamined Patent Publication No. 2010-146205, even if the cache configuration (the number of ways and the number of entries) is changed based on the fixed information in the picture units, it does not manage the change in the features of the stream changing in accordance with each picture. Thus, it does not sufficiently contribute to the improvement of the cache efficiency, accordingly to the disclosure.