In recent years, there have come into widespread use devices which subject an image to compression encoding by employing an encoding format handling image information as digital, and at this time compress the image by orthogonal transform such as discrete cosine transform or the like and motion compensation, taking advantage of redundancy peculiar to the image information, in order to perform highly effective information transmission and storage at that time. Examples of this encoding method include MPEG (Moving Picture Expert Group) and so forth.
In particular, MPEG2 (ISO/IEC 13818-2) is defined as a general-purpose image encoding format, and is a standard encompassing both of interlaced scanning images and sequential-scanning images, and standard resolution images and high definition images. For example, MPEG2 has widely been employed now by broad range of applications for professional usage and for consumer usage. By employing the MPEG2 compression format, a code amount (bit rate) of 4 through 8 Mbps is allocated in the event of an interlaced scanning image of standard resolution having 720×480 pixels, for example. Also, by employing the MPEG2 compression format, a code amount (bit rate) of 18 through 22 Mbps is allocated in the event of an interlaced scanning image of high resolution having 1920×1088 pixels, for example. Thus, a high compression rate and excellent image quality can be realized.
With MPEG2, high image quality encoding adapted to broadcasting usage is principally taken as an object, but a lower code amount (bit rate) than the code amount of MPEG1, i.e., an encoding format having a higher compression rate is not handled. According to spread of personal digital assistants, it has been expected that needs for such an encoding format will be increased from now on, and in response to this, standardization of the MPEG4 encoding format has been performed. With regard to an image encoding format, the specification thereof was confirmed as an international standard as ISO/IEC 14496-2 in December in 1998.
Further, in recent years, standardization of a standard called H.26L (ITU-T Q6/16 VCEG) has progressed, originally intended for image encoding for videoconferencing usage. With H.26L, it has been known that as compared to a conventional encoding format such as MPEG2 or MPEG4, though greater computation amount is required for encoding and decoding thereof, higher encoding efficiency is realized. Thereafter, as part of activity of MPEG4, standardization for also taking advantage of functions not supported by H.26L with this H.26L taken as a base, to realize higher encoding efficiency, has been performed as Joint Model of Enhanced-Compression Video Coding. As a schedule of standardization, H.264 and MPEG-4 Part 10 (Advanced Video Coding, hereafter, referred to as H.264/AVC) become an international standard in March, 2003.
FIG. 1 is a block diagram illustrating a configuration example of an image encoding device with a compressed image based on H.264/AVC as output.
With the example in FIG. 1, the image encoding device 1 has an A/D conversion unit 11, a screen rearranging buffer 12, a computing unit 13, an orthogonal transform unit 14, a quantization unit 15, a lossless encoding unit 16, and a storage buffer 17, an inverse quantization unit 18, an inverse orthogonal transform unit 19, a computing unit 20, a deblocking filter 21, frame memory 22, a switch 23, an intra prediction unit 24, a motion prediction/compensation unit 25, a prediction image selecting unit 26, and a rate control unit 27.
The A/D conversion unit 11 performs A/D conversion of an input image, and outputs to the screen rearranging buffer 12 and stores. The screen rearranging buffer 12 rearranges the images of frames in the stored order for display into the order of frames for encoding according to GOP (Group of Picture).
The computing unit 13 subtracts, from the image read out from the screen rearranging buffer 12, the prediction image from the intra prediction unit 24 or the prediction image from the motion prediction/compensation unit 25, selected by the prediction image selecting unit 26, and outputs difference information thereof to the orthogonal transform unit 14. The orthogonal transform unit 14 subjects the difference information from the computing unit 13 to orthogonal transform, such as discrete cosine transform, Karhunen-Loéve transform, or the like, and outputs a transform coefficient thereof. The quantization unit 15 quantizes the transform coefficient that the orthogonal transform unit 14 outputs.
The quantized transform coefficient serving as the output of the quantization unit 15 is input to the lossless encoding unit 16, and subjected to lossless encoding, such as variable length coding, arithmetic coding, or the like, and thus compressed.
The lossless encoding unit 16 obtains information indicating intra prediction from the intra prediction unit 24, and obtains information indicating an inter prediction mode, and so forth from the motion prediction/compensation unit 25. Note that the information indicating intra prediction and the information indicating inter prediction will also be referred to as intra prediction mode information and inter prediction mode information, respectively, hereinafter.
The lossless encoding unit 16 encodes the quantized transform coefficient, and also encodes the information indicating intra prediction, information indicating inter prediction mode, and so forth, and takes these as part of header information in a compressed image. The lossless encoding unit 16 supplies the encoded data to the storage buffer 17 for storing.
For example, with the lossless encoding unit 16, lossless encoding processing, such as variable length coding, arithmetic coding, or the like, is performed. Examples of the variable length coding include CAVLC (Context-Adaptive Variable Length Coding) stipulated by the H.264/AVC format. Examples of the arithmetic coding include CABAC (Context-Adaptive Binary Arithmetic Coding).
The storage buffer 17 outputs the data supplied from the lossless encoding unit 16 to a decoding side, for example, such as a recording device or transmission path or the like downstream not shown in the drawing, as a compressed image encoded by the H.264/AVC format.
Also, the quantized transform coefficient output from the quantization unit 15 is also input to the inverse quantization unit 18, inversely quantized, and then further subjected to inverse orthogonal transform at the inverse orthogonal transform unit 19. The output subjected to inverse orthogonal transform is added to the prediction image supplied from the prediction image selecting unit 26 by the computing unit 20, and becomes a locally decoded image. The deblocking filter 21 removes block noise of the decoded image, and then supplies to the frame memory 22 for storing. An image prior to being subjected to deblocking filter processing by the deblocking filter 21 is also supplied to the frame memory 22 for storing.
The switch 23 outputs a reference image stored in the frame memory 22 to the motion prediction/compensation unit 25 or intra prediction unit 24.
With this image encoding device 1, for example, the I picture, B picture, and P picture from the screen rearranging buffer 12 are supplied to the intra prediction unit 24 as an image to be subjected to intra prediction (also referred to as intra processing). Also, the B picture and P picture read out from the screen rearranging buffer 12 are supplied to the motion prediction/compensation unit 25 as an image subjected to inter prediction (also referred to as inter processing).
The intra prediction unit 24 performs intra prediction processing of all of the candidate intra prediction modes based on the image to be subjected to intra prediction read out from the screen rearranging buffer 12, and the reference image supplied from the frame memory 22, to generate a prediction image.
At that time, the intra prediction unit 24 calculates a cost function value as to all of the candidate intra prediction modes, and selects an intra prediction mode wherein the calculated cost function value provides the minimum value, as the optimal intra prediction mode.
The intra prediction unit 24 supplies the prediction image generated in the optimal intra prediction mode, and the cost function value thereof to the prediction image selecting unit 26. In the event that the prediction image generated in the optimal intra prediction mode has been selected by the prediction image selecting unit 26, the intra prediction unit 24 supplies information indicating the optimal intra prediction mode to the lossless encoding unit 16. The lossless encoding unit 16 encodes this information, and takes this as part of the header information in the compressed image.
The image subjected to inter processing read out from the screen rearranging buffer 12, and the reference image are supplied from the frame memory 22 to the motion prediction/compensation unit 25 via the switch 23. The motion prediction/compensation unit 25 performs motion prediction of a block in all of the candidate inter prediction modes to generate the motion vector of each block.
The motion prediction/compensation unit 25 uses the predicted motion vector of each block to calculate a cost function value as to all of the candidate inter prediction modes. The motion prediction/compensation unit 25 determines, of the calculated cost function values, the prediction mode of a block that provides the minimum value as the optimal inter prediction mode.
The motion prediction/compensation unit 25 supplies the prediction image of a block to be processed of the determined optimal inter prediction mode, and the cost function value thereof to the prediction image selecting unit 26. In the event that the prediction image of the block to be processed of the optimal inter prediction mode has been selected by the prediction image selecting unit 26, the motion prediction/compensation unit 25 outputs information indicating the optimal inter prediction mode (inter prediction mode information) to the lossless encoding unit 16.
At this time, the motion vector information, reference frame information, and so forth are also output to the lossless encoding unit 16. The lossless encoding unit 16 also subjects the information from the motion prediction/compensation unit 25 to lossless encoding processing such as variable length coding, arithmetic coding, or the like, and inserts into the header portion of the compressed image.
The prediction image selecting unit 26 determines the optimal prediction mode out of the optimal intra prediction mode and optimal inter prediction mode based on each cost function value output from the intra prediction unit 24 or motion prediction/compensation unit 25. The prediction image selecting unit 26 then selects the prediction image of the determined optimal prediction mode, and supplies to the computing units 13 and 20. At this time, the prediction image selecting unit 26 supplies selection information of the prediction image to the intra prediction unit 24 or motion prediction/compensation unit 25.
The rate control unit 27 controls a rate of the quantization operation of the quantization unit 15 based on the compressed image stored in the storage buffer 17 so as not to cause overflow nor underflow.
FIG. 2 is a block diagram illustrating a configuration example of an image decoding device corresponding to the image encoding device in FIG. 1.
With the example in FIG. 2, the image decoding device 31 is configured of a storage buffer 41, a lossless decoding unit 42, an inverse quantization unit 43, an inverse orthogonal transform unit 44, a computing unit 45, a deblocking filter 46, a screen rearranging buffer 47, a D/A conversion unit 48, frame memory 49, a switch 50, an intra prediction unit 51, a motion compensation unit 52, and a switch 53.
The storage buffer 41 stores the transmitted compressed image. The lossless decoding unit 42 decodes information encoded by the lossless encoding unit 16 in FIG. 1 supplied from the storage buffer 41 with a format corresponding to the encoding format of the lossless encoding unit 16. The inverse quantization unit 43 inversely quantizes the image decoded by the lossless decoding unit 42 with a format corresponding to the quantization format of the quantization unit 15 in FIG. 1. The inverse orthogonal transform unit 44 subjects to inverse orthogonal transform the output of the inverse quantization unit 43 with a format corresponding to the orthogonal transform format of the orthogonal transform unit 14 in FIG. 1.
The output subjected to inverse orthogonal transform is added to the prediction image supplied from the switch 53 from the computing unit 45 and decoded. The deblocking filter 46 removes block noise of the decoded image, then supplies to the frame memory 49 for storing, and also outputs to the screen rearranging buffer 47.
The screen rearranging buffer 47 performs rearranging of images. Specifically, the order of frames rearranged for encoding order by the screen rearranging buffer 12 in FIG. 1 is rearranged into the original display order. The D/A conversion unit 48 subjects the image supplied from the screen rearranging buffer 47 to D/A conversion, output to an unshown display for display.
The switch 50 reads out an image to be subjected to inter processing, and an image to be referenced from the frame memory 49, outputs to the motion compensation unit 52, and also reads out an image to be subjected to intra prediction from the frame memory 49, and supplies to the intra prediction unit 51.
Information indicating the intra prediction mode obtained by decoding the header information is supplied from the lossless decoding unit 42 to the intra prediction unit 51. The intra prediction unit 51 generates a prediction image based on this information, and outputs the generated prediction image to the switch 53.
Of the information obtained by decoding the header information, the inter prediction mode information, motion vector information, reference frame information, and so forth are supplied from the lossless decoding unit 42 to the motion compensation unit 52. The inter prediction mode information is transmitted for each macroblock. The motion vector information and reference frame information is transmitted for each block to be processed.
The motion compensation unit 52 uses the motion vector information, reference frame information, and so forth supplied from the lossless decoding unit 42 in the prediction mode that the inter prediction mode information supplied for the lossless decoding unit 42 indicates to generate pixel values of the prediction image corresponding to the block to be processed. The generated pixel values of the prediction image are supplied to the computing unit 45 via the switch 53.
The switch 53 selects the prediction image generated by the motion compensation unit 52 or intra prediction unit 51, and supplies to the computing unit 45.
Further, as an extension of this H.264/AVC, standardization of FRExt (Fidelity Range Extension) including a coding tool necessary for business use such as RGB, 4:2:2, or 4:4:4, 8×8DCT and quantization matrix stipulated by MPEG-2 has been completed in February in 2005. Thus, H.264/AVC can be used as an encoding format capable of suitably expressing even film noise included in movies, and has come to be employed for wide ranging applications such as Blu-Ray Disc (registered trademark) and so forth.
However, nowadays, needs for further high-compression encoding have been increased, such as intending to compress an image having around 4000×2000 pixels, which is quadruple of a high-vision image, or alternatively, needs for further high-compression encoding have been increased, such as intending to distribute a high-vision image within an environment with limited transmission capacity like the Internet. Therefore, with the above-mentioned VCEG (=Video Coding Expert Group) under the control of ITU-T, studies relating to improvement of encoding efficiency have continuously been performed.
As a technique for improving such encoding efficiency, a technique called an adaptive loop filter (ALF (Adaptive Loop Filter)) has been proposed in NPL 1.
FIG. 3 is a block diagram illustrating a configuration example of an image encoding device to which an adaptive loop filter has been applied. Note that, with the example in FIG. 3, for convenience of description, the A/D conversion unit 11, screen rearranging buffer 12, storage buffer 17, switch 23, intra prediction unit 24, prediction image selecting unit 26, and rate control unit 27 in FIG. 1 are omitted. Also, an arrow and so forth are also omitted. Accordingly, in the case of the example in FIG. 3, the reference image from the frame memory 22 is directly input to the motion prediction/compensation unit 25, and the prediction image from the motion prediction/compensation unit 25 is directly output to the computing units 13 and 20.
Specifically, the image encoding device 61 in FIG. 3 differs from the image encoding device 1 in FIG. 1 only in that an adaptive loop filter 71 is added between the deblocking filter 21 and frame memory 22.
The adaptive loop filter 71 perform calculation of an adaptive loop filter coefficient so as to minimize residual error with the original image from the screen rearranging buffer 12 (drawing is omitted), and uses this adaptive loop filter coefficient to perform filter processing on the decoded image from the deblocking filter 21. As for this filter, a Wiener filter (Wiener Filter) is employed, for example.
Also, the adaptive loop filter 71 transmits the calculated adaptive loop filter coefficient to the lossless encoding unit 16. The lossless encoding unit 16 performs lossless encoding processing such as variable length coding, arithmetic coding, or the like on this adaptive loop filter coefficient, and inserts into the header portion of the compressed image.
FIG. 4 is a block diagram illustrating a configuration example of an image decoding device corresponding to the image encoding device in FIG. 3. Note that, with the example in FIG. 4, for convenience of description, the storage buffer 41, screen rearranging buffer 47, D/A conversion unit 48, switch 50, intra prediction unit 51, and switch 53 in FIG. 2 are omitted. Also, an arrow and so forth are also omitted. Accordingly, in the case of the example in FIG. 4, the reference image from the frame memory 49 is directly input to the motion compensation unit 52, and the prediction image from the motion compensation unit 52 is directly output to the computing unit 45.
Specifically, the image decoding device 81 in FIG. 4 differs from the image decoding device 31 in FIG. 2 only in that an adaptive loop filter 91 is added between the deblocking filter 46 and frame memory 49.
An adaptive loop filter coefficient decoded at the lossless decoding unit 42 and extracted from the header is supplied to the adaptive loop filter 91. The adaptive loop filter 91 uses the supplied filter coefficient to perform filter processing on the decoded image from the deblocking filter 46. As for this filter, a wiener filter is employed, for example.
Thus, the image quality of a decoded image can be improved, and further the image quality of a reference image can also be improved.
Now, with the above H.264/AVC format, the macroblock size is 16×16 pixels. However, the macroblock size of 16×16 pixels is not optimal for large image frames such as UHD (Ultra High Definition; 4000×2000 pixels) which will be handled by next-generation encoding formats.
Therefore, with NPL 2 and so forth, there has been proposed enlarging the macroblock size to a size of such as 32×32 pixels, for example.
Note that, though NPL 2 is a proposal wherein an extended macroblock is applied to inter-slice, there has been a proposal in NPL 3 wherein an extended macroblock is applied to intra-slice.