In recent years, systems for storage, reproduction, or transmission of image signals are becoming popular, including photo CD, video CD, DVD video (Digital Versatile Disk-Video), TV telephones, TV conferences, digital TV broadcasting, VOD (Video On Demand), and so on. In an image transmission system, as shown in FIG. 29A, on the transmitter side image encoder 1 encodes an input image into a bit stream, and bit stream transmitter 2 transmits the bit stream through a network to a receiver; on the receiver side bit stream receiver 3 receives the bit stream and image decoder 4 decodes the received bit stream to output an image. In an image storage system, as shown in FIG. 29B, image encoder 1 encodes an input image into a bit stream and the bit stream is stored in bit stream storage 5 such as a storage medium. In an image reproduction system, as shown in FIG. 29C, image decoder 4 decodes a bit stream stored in bit stream storage 5 and outputs a reproduced image.
In these systems there is a limit to a transmission band or to capacity of storage, and thus there are demands for effectively utilizing these resources by use of an image encoding system with a compression rate as high as possible. The image encoding systems can be classified into still picture encoding systems and moving picture encoding systems, an example of which will be described below using the moving picture encoding systems.
The conventionally known encoding systems for moving picture signals include the International Video Coding Standards such as ITU-T Recommendation H.263, ISO/IEC International Standard 14496-2 (MPEG-4 Visual), and so on.
In these moving picture encoding systems, generally, a motion compensated interframe prediction is performed for a frame image forming a moving picture signal supplied as a coding target, with respect to another frame image to reduce temporal redundancy in the moving picture signal; orthogonal transform and quantization operations are carried out for a difference image signal as a result of the interframe prediction or for an image signal not subjected to the interframe prediction to reduce spatial redundancy in the moving picture signal; and information source coding is further carried out for prediction and transform data such as obtained motion vector, orthogonal transform coefficients, and so on to reduce redundancy of data representation. These processes eliminate the redundancies in the moving picture signal, thus achieving efficient coding of moving picture.
In these moving picture encoding systems, the processes as described above are carried out for each of segments in a frame image, called macroblocks. In general one macroblock consists of 16 pixels×16 pixels, and a set of such macroblocks constitute a slice or a frame. On the other hand, one macroblock contains units of blocks resulting from subdivision thereof, and the aforementioned processes including the motion compensation, the orthogonal transform, etc. are carried out in units of up to macroblocks and, if necessary, in units of smaller blocks.
FIG. 1 is a flowchart schematically showing an example of a moving picture encoding method. This encoding method is an image encoding method of effecting predetermined transform and encoding operations on an input image D1 being a frame image in a moving picture or the like to generate transmissible or storable, data-compressed, coded data D9 in the image transmission system or in the image storage system.
In the image encoding method shown in FIG. 1, the data processing operation is carried out in units of macroblocks obtained by dividing the input image D1 into a predetermined size (a predetermined number of pixels). First, the input image D1 is subjected to the predetermined data processing operation to effect transformation of convert image data, thereby outputting a predictive residual image D5 expressed by space coordinates, coding mode information D3 indicating information about the data processing operation, and motion vector information D2 (step S101). The motion vector information D2 herein may contain, for example, a value of a motion vector itself or a motion vector difference value being a difference between a motion vector in a target block and a motion vector in a neighboring block.
Specifically, with reference to a predetermined image region in a local decoded image D12 described later, a search is made for an image area resembling an image of the coding target macroblock (motion search), and a motion vector is determined by an amount of spatial movement of the coding target macroblock relative to the image area resembling the coding target macroblock, detected as a result of the search (motion compensated prediction). Also generated as a predictive residual image D5 is difference data of pixel values between the coding target macroblock and the image area resembling the image data of the coding target macroblock, detected as a result of the search. Then, based on the motion vector and difference data of pixel values obtained as a result of the motion search, a macroblock coding mode to be applied to the image data is selected from a plurality of macroblock coding modes prepared.
The macroblock coding modes are generally classified under inter coding modes using the motion compensation and intra coding modes not using the motion compensation. In the inter coding modes, the motion compensation is applied to a macroblock, and predictive residuals of pixel values obtained as a result thereof are outputted as a predictive residual image D5. In the intra coding modes, predicted values of pixel values in the macroblock are set to 0, whereby the input image D1 is directly outputted as a predictive residual image D5. Information indicating a selected coding mode and a quantization parameter is outputted as coding mode information D3, and information about the motion vector as motion vector information D2.
Then the predictive residual image D5 is subjected to an orthogonal transform operation to generate a plurality of orthogonal transform coefficients D6 which are image data expressed by space frequencies (frequency image data) (step S102). This orthogonal transform is carried out for each of blocks resulting from subdivision of each macroblock, to yield orthogonal transform coefficients of each block.
The orthogonal transform coefficients are quantized by use of a predetermined quantization parameter to obtain quantized orthogonal transform coefficients D7 (step S103).
In passing, the quantization of a certain block results in producing the orthogonal transform coefficients all being zero in the block. There is no need for performing the coding of information about orthogonal transform coefficients for such an ineffective block with all the orthogonal transform coefficients of zero. Then the coding of coefficient information of an ineffective block is omitted by use of coded block pattern information (which will be referred to hereinafter as CBP representing Coded Block Pattern) indicating whether there is a significant quantized orthogonal transform coefficient in the block, thereby increasing efficiency of coding.
There are also cases where the result of the quantization process is that all the orthogonal transform coefficients are zero in all the blocks in a macroblock and each component of a motion vector is also zero. For such an ineffective macroblock with all the orthogonal transform coefficients of zero, there is no need for performing the coding of information about the macroblock. Such ineffective macroblocks frequently appear in stationary background portions and the like, and thus a macroblock coding flag (COD flag) is used for each macroblock to discriminate whether the macroblock is effective or ineffective.
The CBP and the COD flag are outputted as coding supplementary information D8.
Subsequently, variable-length coding and multiplexing operations are carried out for the motion vector information D2, coding mode information D3, quantized orthogonal transform coefficients D7, and coding supplementary information D8 to generate coded data D9 being compressed data (step S104).
Specifically, the variable-length coding using a variable-length coding table is effected on each of coding symbols included in the motion vector information D2, coding mode information D3, quantized orthogonal transform coefficients D7, and coding supplementary information D8 to generate the coded data D9.
FIG. 2 is a block diagram showing an example of a configuration of an image encoding apparatus. The image encoding method shown in FIG. 1 will be further described below with reference to the image encoding apparatus shown in FIG. 2.
With an input image D1 supplied as a coding target, first, a luma (luminance) signal image frame is divided into macroblocks of square image blocks in the size of 16 pixels×16 lines, and a chroma (color-difference) signal image frame into macroblocks of square image blocks in the size of 8 pixels×8 lines. These macroblocks are image blocks used as units of the data processing including the motion compensation and others. In after-described DCT (an orthogonal transform, e.g., in the MPEG-4 coding system, the blocks used are DCT blocks of the size of 8 pixels×8 lines. In this case, one macroblock has four luma (Y) blocks and two chroma (Cb, Cr) blocks in DCT. Image coding is carried out for each of these blocks.
The input image D1 is fed into a motion compensation means consisting of motion detector 11 and motion compensator 12. First, the input image D1 is fed into motion detector 11 and a motion of image is detected for each macroblock. The motion detector 11 compares image data of the coding target macroblock with image data in an image region of the same size as the macroblock, in a local decoded image to detect an image area resembling the image of the coding target macroblock and generate a motion vector D2 indicating a motion of the image.
Specifically, the motion detector 11 searches for an image area resembling the macroblock as a coding target in the input image D1, with reference to a predetermined image region in a local decoded image D12 stored as a previously coded frame image in frame memory 20. Then motion vector information D2 is determined by an amount of spatial movement between the coding target macroblock and the image area resembling the image data of the coding target macroblock, detected as a result of the search.
At this time, a coding mode applied to the coding target macroblock is also selected out of a plurality of coding modes prepared in advance. FIGS. 3A-3C are schematic diagrams showing an example of the coding modes prepared for the motion compensation. The coding mode exemplified in FIG. 3A is an inter coding mode 0, the coding mode in FIG. 3B an inter coding mode 1, and the coding mode in FIG. 3C an intra coding mode 2.
The inter coding modes 0-1 are modes of carrying out interframe coding, using mutually different block segmentation ways into motion compensation blocks. Concerning the motion compensation blocks in each mode, as shown in FIG. 3A, the inter coding mode 0 uses one block in the size of 16 pixels×16 lines for a luma component image. As shown in FIG. 3B, the inter coding mode 1 uses four blocks in the size of 8 pixels×8 lines for a luma component image.
The aforementioned motion vector information D2 is given to each motion compensation block segmented in the selected inter coding mode. Therefore, each macroblock is given the motion vector information D2 by the number of segment blocks. The motion vector information D2 is assigned to each motion compensation block, for example, according to the order indicated by numbers in each coding mode in FIGS. 3A-3C. In the both inter coding modes, a block in the size of 8 pixels×8 lines is used for a chroma component image, and a motion vector assigned thereto is one in a length equal to a half of a motion vector for a luma component image.
An example of the coding mode selecting method is as follows: for example, a variance value is first determined of pixel values in a predictive residual image after the motion compensation in a macroblock; where the variance value is larger than a preset threshold or larger than a variance value of pixel values in the macroblock in the input image, the intra coding mode is selected; the inter coding mode is selected in the other cases. This means that the intra coding mode is selected where the image data of the macroblock is complex.
When the inter coding mode is selected, the motion search is carried out for each of the four segment blocks of a macroblock to generate motion vectors and difference values of image data corresponding to the respective blocks. Then calculated are a code amount M(MV) for one motion vector in the inter coding mode 0 and a total M(4MV) of code amounts for four motion vectors in the inter coding mode 1. Further calculated are difference values D(MV) of image data in the inter coding mode 0 and a total D(4MV) of difference values of image data in the inter coding mode 1. Then, using a preset coefficient α, a comparison is made between values of M(MV)+α·D(MV) and M(4MV)+α·D(4MV); for example, when the former is smaller than or equal to the latter, the inter coding mode 0 is selected; when the latter is smaller, the inter coding mode 1 is selected.
Where the coding mode is the inter coding mode, after a motion vector is obtained for each motion compensation block, motion compensator 12 generates a predicted image D4, using the motion vector information D2 from motion detector 11 and the local decoded image D12 from frame memory 20. Subsequently, subtracter 13 calculates the difference (predictive residual) between input image D1 and predicted image D4 to generate a predictive residual image D5.
Where the coding mode is the intra coding mode, the pixel data of predicted image D4 is set to 0, whereby the input image D1 is directly outputted as a predictive residual image D5.
The information indicating the selected coding mode, and the quantization parameter are outputted as coding mode information D3, and the information about the motion vector as motion vector information D2.
The image data of predictive residual image D5 is fed into orthogonal transform part (orthogonal transform means) 14. The orthogonal transform part 14 performs the orthogonal transform for each orthogonal transform block included in a macroblock of the predictive residual image D5, to generate orthogonal transform coefficients D6 being frequency image data. For example, in MPEG-4, a macroblock includes four blocks of the size of 8 pixels×8 lines for each luma component image, and a macroblock includes one orthogonal transform block of the size of 8 pixels×8 lines for each chroma component image.
FIGS. 4A and 4B are diagrams showing the orthogonal transform of image data. Image data of each block resulting from the division for the orthogonal transform, in the predictive residual image D5 is space image data and, as exemplified by image components of 8 pixels×8 lines in FIG. 4A, it is represented by space image components a11-a88 of 8 pixels×8 lines defined by horizontal coordinates and vertical coordinates. The orthogonal transform part 14 performs the orthogonal transform of this space image data by a predetermined transformation method to transform it into frequency image data as shown in FIG. 4B. This frequency image data is represented by orthogonal transform coefficients f11-f88 being frequency image components of 8 pixels×8 lines defined by horizontal frequencies and vertical frequencies.
A specific orthogonal transform applicable herein is, for example, the Discrete Cosine Transform (DCT). The DCT is an orthogonal transform using the cosine terms of the Fourier transform and is often used in image coding. The DCT over the space image data generates the DCT coefficients f11-f88 being frequency image data. In the DCT, for example, in the MPEG-4 coding system, DCT blocks of 8 pixels×8 lines as shown in FIGS. 4A, 4B are used as blocks for the orthogonal transform.
Quantizer 15 quantizes the orthogonal transform coefficients D6 generated in this way, by a predetermined quantization parameter to yield quantized orthogonal transform coefficients D7. The quantizer also generates CBPs indicating whether there is a significant orthogonal transform coefficient, in units of blocks and, a COD flag indicating whether a macroblock contains a significant orthogonal transform coefficient, and outputs them as coding supplementary information D8.
The quantized orthogonal transform coefficients D7 and coding supplementary information D8 generated by quantizer 15 are subjected to variable-length coding at variable-length encoder 16, which generates coded data D9 being compressed data of the input image D1. The variable-length encoder 16 further receives input of the motion vector information D2 detected by the motion detector 11 and the coding mode information D3 indicating the coding mode selected at motion detector 11, and the quantization parameter. These motion vector information D2 and coding mode information D3 are also subjected to the variable-length coding at variable-length encoder 16, and the resultant data is multiplexed over the coded data D9.
In the present image encoding apparatus, the quantized orthogonal transform coefficients D7 generated at quantizer 15 are dequantized by dequantizer 17 to yield dequantized orthogonal transform coefficients D10, and they are further subjected to an inverse orthogonal transform at inverse orthogonal transform part 18 to yield a local decoded residual image D11. Then the local decoded residual image D11 and the predicted image D4 are added at adder 19 to generate a local decoded image D12. This local decoded image D12 is stored into frame memory 20 to be utilized for the motion compensation of another frame image.
An example of a moving picture decoding method and moving picture decoding apparatus will be described below.
FIG. 5 is a flowchart schematically showing an example of an image decoding method. The present decoding method is an image decoding method of effecting predetermined decoding and transformation operations on coded data D9 generated by the image encoding method shown in FIG. 1, to restore a decoded image D12 as an image identical to the local decoded image D12.
In the image decoding method shown in FIG. 5, first, the coded data D9 is subjected to variable-length decoding using a variable-length decoding table to generate quantized orthogonal transform coefficients D7 (step S201). The motion vector information D2, coding mode information D3, and coding supplementary information D8 are also variable-length decoded similarly from the coded data D9 by use of the variable-length decoding table.
Specifically, a variable-length decoding table is first set as a table to be applied to the coded data D9, and the coded data is variable-length decoded by use of the variable-length decoding table to generate coding symbols of the respective information items.
Then the quantized orthogonal transform coefficients D7 are subjected to a dequantization operation to generate dequantized orthogonal transform coefficients D10 (step S202) and an inverse orthogonal transform operation is further carried out to generate a local decoded residual image D11 (step S203). Then, using the local decoded residual image D11 and a previously decoded frame, the motion compensation is carried out by applying a coding mode indicated by the coding mode information D3, to restore the decoded image D12 (S204).
FIG. 6 is a block diagram schematically showing a configuration of an example of a moving picture decoding apparatus.
Coded data D9 supplied as a decoding target is fed into variable-length decoder 21 which performs the variable-length decoding using a predetermined variable-length decoding table to generate decoding symbols of the motion vector information D2, coding mode information D3, quantized orthogonal transform coefficients D7, and coding supplementary information D8. Specifically, for data-compressed coded data D9, the variable-length decoder 21 retrieves each data included for each macroblock in the coded data D9, from a bit stream while starting from the head of the frame image, and it variable-length decodes the data to generate the motion vector information D2, coding mode information D3, quantized orthogonal transform coefficients D7, and coding supplementary information D8. The variable-length decoding table used for the variable-length decoding is switched to another according to each symbol as occasion demands, as described above.
The quantized orthogonal transform coefficients D7 are subjected to dequantization and inverse orthogonal transform at dequantizer 22 and inverse orthogonal transformer 23. This results in generating a local decoded residual image D11. This local decoded residual image D11 is an image corresponding to the predictive residual image D5 before the coding, but some information is lost through the quantization and dequantization processes.
On the other hand, the motion vector information D2 and coding mode information D3 is fed into motion compensator 24. The motion compensator 24 performs the motion compensation for the image by the coding mode indicated by the coding mode information D3 to generate a predicted image D4, using the motion vector information D2 from the variable-length decoder 21 and a decoded image stored in frame memory 25. Then adder 26 adds the local decoded residual image D11 and the predicted image D4 to output a recovered frame image as a decoded image D12.