Conventionally, in international standard video encoding methods, such as MPEG and ITU-T H.26x, each input: video frame is subjected to a compression process with the video frame being divided into macro blocks each of which consists of 16×16 pixel blocks.
On the other hand, in recent years, a technique of compression-encoding a high-definition high-quality video having a video format, such as a 4K×2K-pixel video format having a space resolution which is four times as high as that of HDTV (High Definition Television, 1920×1080 pixels), a 8K×4K-pixel video format having a space resolution which is further increased to four times as high as that of the 4K×2K-pixel video format, or a 4:4:4 video signal format of increasing the number of sampled chrominance signals, thereby improving the color reproduction nature, has been desired. When compression-encoding such a high-definition high-quality video, it is impossible to perform an encoding process toy using an linage signal correlation in a 16×16 pixel macro block to a sufficient degree, and it is therefore difficult to provide a high compression ratio. In order to deal with this problem, for example, a technique of extending the size of each conventional 16×16 pixel macro block to a 32×32 pixel block, as disclosed in nonpatent reference 1, and increasing the unit to which a motion vector is allocated, thereby reducing the amount of encoded parameters required for prediction, or a technique of increasing the block size for the conversion encoding of a prediction error signal, thereby removing a correlation between pixels of the prediction error signal effectively, have been proposed.
FIG. 21 is a block diagram, showing the structure of an encoding device disclosed in nonpatent reference 1. In encoding disclosed in nonpatent reference 1, a block dividing unit 1002 divides an inputted video signal 1001 which is a target to be encoded into macro blocks (rectangular blocks of a luminance signal each having 32 pixels×32 lines), and is inputted to a predicting unit 1004 as an encoded video signal 1003.
The predicting unit 1004 predicts an image signal of each color component in each macro block within each frame and between frames to acquire a prediction error signal 1005. Especially, when performing a motion-compensated prediction between frames, the predicting unit searches for a motion vector for each macro block itself or each of sub-blocks into which each macro block is further divided, creates a motion-compensated prediction image according to the motion vector, and acquires a prediction error signal 1005 by calculating the difference between the motion-compensated prediction image and the encoded video signal 1003.
After performing a DCT (discrete cosine transform) process on the prediction error signal 1005 to remove a signal correlation from the prediction error signal 1005 while changing the block size according to the size of a unit area to which the motion vector is allocated, a compressing unit 1006 quantizes the prediction error signal to acquire compressed data 1007. While the compressed data 1007 is entropy-encoded and outputted as a bit stream 1009 by a variable length encoding unit 1008, the compressed data is also sent to a local decoding unit 1010 and a decoded prediction error signal 1011 is acquired by this local decoding unit.
This decoded prediction error signal 1011 is added to a prediction signal 1012 which is used to create the prediction error signal 1005 to create a decoded signal 1013, and this decoded signal is inputted co a loop filter 1014. The decoded signal 1013 is stored in a memory 1016 as a reference linage signal 1015 for creating a subsequent prediction signal 1012 after the decoded signal is subjected to a process of removing a block distortion by the loop filter 1014. A parameter 1017 used for the creation of the prediction signal, which is determined by the predicting unit 1004 in order to acquire the prediction signal 1012, is sent to the variable length encoding unit 1008, and is multiplexed into a bit stream 1009 and this bit stream is outputted. Information, such as intra prediction mode information showing how to perform a space prediction within each frame, and a motion vector showing an amount of inter-frame movement, is included in the parameter 1017 used for the creation of the prediction signal, for example.
While a conventional international standard video encoding method, such as MPEG or ITU-T H.26x, uses 16×16 pixels as the macro block size, the encoding device disclosed in nonpatent reference 1 uses 32×32 pixels as the macro block size (super macro block: SMB). FIG. 22 shows the shapes of divided regions to each of which a motion vector is allocated at the time of performing a motion-compensated prediction for each M×M pixel macro block, and FIG. 22(a) shores each SMB disclosed in nonpatent reference 1 and FIG. 22(b) shows each macro block based on conventional MPEG-4 AVC/H.264 (refer to nonpatent reference 2). While each SMB has a large area for each motion prediction region which is covered by a single motion vector with the number of pixels M=32, each conventional macro block uses the number of pixels M/2=16. As a result, because in the case of SMBs the amount of information of the motion vector which is needed for the entire screen decreases compared with the case of conventional macro blocks having the number of pixels M/2=16, the amount of motion vector code which should be transmitted as a bit stream can be reduced.