Recently, an apparatus complying with a scheme such as MPEG (Moving Picture Experts Group) which digitally handles image information and compresses the same by orthogonal transform such as discrete cosine transform and motion compensation by using redundancy specific to the image information in order to transmit and accumulate the information with high efficiency is in widespread use in both information distribution in a broadcast station and information reception in standard home.
Especially, MPEG-2 (ISO (International Organization for Standardization)/IEC (International Electrotechnical Commission) 13818-2) defined as a general-purpose image coding scheme is a standard covering both an interlace scan image and a non-interlace scan image and both a standard-resolution image and a high-definition image and is currently widely used in a wide range of professional-use and consumer-use applications. By using MPEG-2, it is possible to realize a high compression ratio and excellent image quality by assigning a code amount (bit rate) of 4 to 8 Mbps and 18 to 22 Mbps for a standard-resolution interlace scan image having 720×480 pixels and a high-resolution interlace scan image having 1920×1088 pixels, respectively, for example.
A principal object of MPEG-2 is high image quality coding suitable for broadcasting, but this does not meet a coding scheme with a lower code amount (bit rate), that is, with a higher compression ratio than those of MPEG-1. Needs for such coding scheme is considered to grow in future with the spread of portable terminals, and MPEG-4 is standardized in reply to this. Regarding an image coding scheme, this standard is approved as an international standard as ISO/IEC 14496-2 in December 1998.
Further, recently, a standard referred to as H.26L (ITU-T (International Telecommunication Union Telecommunication Standardization Sector) Q6/16 VCEG (Video Coding Expert Group)) is being standardized with the target of the image coding initially for television conference. H.26L is known to realize higher coding efficiency though this requires more arithmetic operation amount for coding and decoding than that of a conventional coding scheme such as MPEG-2 and MPEG-4. Currently, standardization to realize the higher coding efficiency by introducing a function which is not supported by H.26L based on H.26L is performed as Joint Model of Enhanced-Compression Video Coding as a part of activities of MPEG-4.
As a schedule of standardization, this becomes the international standard under the name of H.264/MPEG-4 Part10 (Advanced Video Coding, hereinafter referred to as AVC) in March 2003.
Further, standardization of FRExt (Fidelity Range Extension) including a coding tool necessary for professional use such as RGB, 4:2:2, and 4:4:4 and also including 8×8 DCT and a quantization matrix defined in MPEG-2 is completed in February 2005 as an extension thereof, and according to this, this is used in a wide range of applications such as Blu-Ray Disc as the coding scheme capable of representing also a film noise included in a movie in an excellent manner by using AVC.
However, recently, needs for coding with a higher compression ratio such as to compress an UHD (Ultra High Definition; 4000 pixels×2000 pixels) image with pixels four times as many as those of a high-vision image or to distribute a high-vision image in an environment of a limited transmission capacity such as the Internet grow. Therefore, improvement in coding efficiency is continuously studied by VCEG (Video Coding Expert Group) under the umbrella of ITU-T described above.
In order to further improve the coding efficiency as compared to AVC, standardization of the coding scheme referred to as HEVC (High Efficiency Video Coding) by JCTVC (Joint Collaboration Team-Video Coding) being a common standardization organization of ITU-T and ISO/IEC is under way (for example, refer to Patent Document 1).
In such AVC and HEVC, a motion compensated filter (MC filter) is used such that motion compensation of fractional accuracy may be performed at the time of inter prediction (inter-frame prediction). As the MC filter, an 8-tap FIR (Finite Impulse Response) filter is used for a luminance component and a 4-tap FIR filter is used for a chrominance component in a case of HEVC, for example.
In the inter prediction, a coded image is taken out from a frame memory as a reference image based on a block size, a prediction mode, a motion vector (MV) and the like and the MC filter is applied based on a fractional part of the MV to generate an inter predicted image. Meanwhile, the MV is obtained by motion estimation in an encoder and from stream information in a decoder.
In a case of the MV of fractional accuracy, a tap count of the MC filter for generating the predicted image becomes larger. That is, the number of pixels of the reference image required for generating the predicted image becomes larger than in a case in which the MV is integer. Therefore, more pixel data should be read from the frame memory.
A pixel value of the reference image read for generating a certain pixel of the predicted image may also be used for generating another pixel around the pixel, so that usage efficiency of the reference image is higher as a size of the predicted image to be generated is larger, in general. That is, the number of pixels of the reference image required for generating the predicted image is smaller as the size of the predicted image is larger when being converted to the number per one pixel of the predicted image. In other words, the usage efficiency of the reference image becomes lower as the size of the predicted image is smaller and the number of pixels of the reference image required for generating one pixel of the predicted image becomes larger.
Further, in a case of bidirectional prediction, the number of reference images becomes larger than that in unidirectional prediction.
In this manner, when a size of the reference image to be read becomes larger, it becomes necessary to secure a larger memory bandwidth between a frame memory and an inter prediction unit in order to maintain a processing speed of coding and a load becomes larger.
Therefore, in AVC, it is possible to limit the resources used by a decoder according to a level. Specifically, there is an upper limit of the number of motion vectors which may be used in two consecutive macroblocks. The limitation prevents consecutive occurrences of the inter prediction of a small block size. The bidirectional prediction of a macroblock having a 8×8 block size is also forbidden.