The present invention relates to a motion picture coding and decoding apparatus for coding and decoding motion picture or image data represented in digital manner. More specifically, the present invention relates to a motion picture coding and decoding apparatus free of image degradation.
In image coding, a method of superimposing different motion picture sequences has been studied. In an article entitled xe2x80x9cAn Image Coding Scheme Using Layered Representation and Multiple Templatesxe2x80x9d (Technical Report of IEICE, IE94-159, pp. 99-106 (1995)) discloses a method of forming a new sequence by superimposing a motion picture sequence as a background and a motion picture sequence of a component motion picture or image as a foreground (for example, video image of a character or fish cut out by chromakey technique).
An article xe2x80x9cTemporal Scalability Based on Image Contentxe2x80x9d, ISO/IEC/JTC1/SC29/WG11 MPEG95/211(1995) discloses a method of forming a new sequence by superimposing a motion picture sequence of component motion images having high frame rate on a motion picture sequence having a low frame rate.
According to this method, referring to FIG. 27, prediction coding is performed at a low frame rate at a lower layer, and prediction coding is performed at a high frame rate only at a selected area (hatched portion) of an upper layer. However, a frame coded in the lower layer is not coded in the upper layer, but decoded image of the lower layer is copied and used as it is. It is assumed that a portion to which a viewer pays attention, such as a figure or a character is selected as the selected area.
FIG. 26 is a block diagram showing a main portion of a conventional motion picture coding and decoding apparatus. Referring to the left side of FIG. 26, in a coding apparatus of the conventional motion picture and encoding apparatus, first and second skipping units 801 and 802 thin out frames of input motion picture data. The input image data thus comes to have lower frame rate and input to upper layer coding unit 803 and lower layer coding unit 804, respectively. It is assumed that the frame rate for the upper layer is not lower than the frame rate of the lower layer.
Input motion picture as a whole is coded in lower layer coding unit 804. Internationally standardized method of motion picture coding such as MPEG or H.261 is used as the coding method. A decoded image of the lower layer is formed in lower layer coding unit 804, which image is utilized for prediction coding and at the same time, input to a superimposing unit 805.
Only the selected area of the input motion picture is coded in upper layer coding unit 803 of FIG. 26. The internationally standardized method of motion picture coding such as MPEG or H.261 is also used here. Only the selected area is coded, however, based on area shape information. A frame which has already been coded in the lower layer is not coded in the upper layer. The area shape information represents shape of the selected area such as a figure portion, and is a binary image assuming the value 1 at the position of the selected area and the value 0 at other positions. Only the selected area of the motion picture is coded in upper layer coding unit 803, and input to superimposing unit 805.
The area shape is coded utilizing 8 directional quantizing code in an area shape coding unit 806. FIG. 25 depicts the 8 directional quantizing code. As can be seen from the figure, the 8 directional quantizing code represents a direction to a next point by a numerical value, which is generally used for representing a digital figure.
At a frame position where a lower layer frame has been coded, superimposing unit 805 outputs a decoded image of the lower layer. At a frame position where the lower layer frame has not been coded, the superimposing unit forms an image by using coded images of preceding and succeeding two coded lower layers of the frame of interest and one upper layer decoded image of the same time point, and outputs the formed image. The image formed here is input to upper layer coding unit 803 and utilized for prediction coding. The method of forming the image in the superimposing unit 805 is as follows.
First, an interpolated image of two lower layers is formed. A decoded image of a lower layer at a time point t is represented as B (x, y, t). Here, x and y are coordinates representing pixel position in a space. When we represent time points of the two lower layers as t1 and t2 and the time point for the upper layer as t3 (where t1 less than t3 less than t2), the interpolated image I (x, y, t3) at time point t3 is calculated as follows.
I(x, y, t3)=[(t2xe2x88x92t3)B(x, y, t1)+(t3xe2x88x92t1)B(x, y, t2)]/(t2xe2x88x92t1)xe2x80x83xe2x80x83(1)
Thereafter, a decoded image E of the upper layer is superimposed on the interpolated image I calculated as above. For this purpose, weight information W(x, y, t) for superimposing is formed from area shape information M(x, y, t), and a superimposed image S is obtained in accordance with the following equation.
S(x, y, t)=[1xe2x88x92W(x, y, t)]I(x, y, t)+E(x, y, t)W(x, y, t)xe2x80x83xe2x80x83(2)
The area shape information M(x, y, t) is a binary image which assumes the value 1 in the selected area and the value 0 outside the selected area. The image passed through a low pass filter for a plurality of times provides weight information W(x, y, t).
More specifically, the weight information W(x, y, t) assumes the value 1 in the selected area, 0 outside the selected area, and a value between 0 and 1 at a boundary of the selected area. The operation of superimposing unit 805 is as described above.
The coded data coded by lower layer coding unit 804, upper layer coding unit 803 and area shape coding unit 806 are integrated by a coded data integrating unit, not shown, and transmitted or stored.
The method of decoding in the conventional apparatus will be described in the following. Referring to the right side of FIG. 26, in the decoding apparatus, coded data are decomposed by a coded data decomposing unit, not shown into coded data for the lower layer, coded data for the upper layer and the coded data for the area shape. The coded data are decoded by a lower layer decoding unit 808, an upper layer decoding unit 807 and an area shape decoding unit 809, as shown in FIG. 26. A superimposing unit 810 of the decoding apparatus is similar to superimposing unit 805 of the coding apparatus. Using the lower layer decoded image and the upper layer decoded image, images are superimposed by the same method as described with respect to the coding side. The superimposed motion picture is displayed on a display, and input to upper layer decoding unit 807 to be used for prediction of the upper layer.
Though a decoding apparatus for decoding both the lower and upper layers has been described, in a decoding apparatus having only a unit for decoding the lower layer, upper layer decoding unit 807 and superimposing unit 810 are unnecessary. As a result, part of the coded data can be reproduced in a smaller hardware scale.
In the conventional art, as represented by the equation (1), when an output image is to be obtained from two lower layer decoded images and one upper layer decoded image, interpolation between two lower layers is performed. Accordingly, when a position of the selected area changes with time, there would be a considerable distortion around the selected area, much degrading the image quality.
FIGS. 28A to 28C are illustrations of the problem. Referring to FIG. 28A, images A and C represent two decoded images of the lower layer, and image B is a decoded image of the upper layer, and the time of display is in the order of A, B and C. Here, selected areas are hatched. In the upper layer, only the selected area is coded, and hence areas outside the selected area are represented by dotted lines. As the selected area moves, an interpolated image obtained from images A and C has two selected areas superimposed as shown by the screened portion of FIG. 28B.
When image B is superimposed using weight information, the output image has three selected areas superimposed as shown in FIG. 28C. Particularly, around (outside) the selected area of the upper layer, the selected areas of the lower layers appear like after images, which significantly degrade the image quality. When the lower layer only is displayed, there is not the aforementioned distortion in the motion picture as a whole, and when the superimposed image of the upper and lower layers is displayed, there appears the aforementioned distortion, and therefore flicker type distortion is generated in the motion picture, which causes extremely severe degradation of image quality.
International standardization (ISO/IEC MPEG4) of the motion picture coding method proposes coding, decoding and synthesizing of images having a plurality of component parts by a coding apparatus and a decoding apparatus having hierarchical structures such as shown in FIG. 29. Here, a component image refers to an image cut out as a component, such as a character or an object in the motion picture. Common motion picture itself is also treated as one of the component images. Generally, among coded data, identification numbers of respective component images are coded and, on the decoding side, the identification numbers are decoded and based on the decoded identification numbers, coded data corresponding to the desired component images are selected.
FIGS. 30A to 30E schematically depict component images and the manner of synthesizing the images. Component image 1 of FIG. 30A is a common motion picture representing background, and component image 2 of FIG. 30B is a motion picture obtained by cutting out a figure only. Component image 3 of FIG. 30C is a motion picture obtained by cutting out a car only. When only the component image 1 is decoded among the coded data, an image of background only corresponding to FIG. 30A is obtained. When component images 1 and 2 are decoded and synthesized, an image such as shown in FIG. 30D is reproduced. When component image 3 is decoded and these three component images are synthesized, an image such as shown in FIG. 30E is reproduced. Here, such a hierarchical nature is referred to as hierarchy of component images.
The conventional coding and decoding apparatuses having hierarchical structure as described above do not have the function of hierarchically coding and decoding image quality of each component image. Here, the image quality refers to spatial resolution of the component image, number of quantization levels, frame rate and so on.
Therefore, an object of the present invention is to prevent degradation of image quality in a motion picture coding and decoding apparatus.
Another object of the present invention is to perform editing process with a desired image quality as needed, in a motion picture coding and decoding apparatus.
A still further object of the present invention is to perform rough edition with images of low quality, and thereafter perform edition using image data of high quality, in a motion picture coding and decoding apparatus.
A still further object of the present invention is to make it possible, in a motion picture coding and decoding apparatus, that a component image of low quality is reproduced when part of coded data are decoded, and that a component image is reproduced with high quality when all coded data are decoded.
A further object of the present invention is to provide a motion picture coding and decoding apparatus having both component image hierarchy and image quality hierarchy.
In the motion picture coding and decoding apparatus in accordance with the present invention, lower layer coding, in which a motion picture sequence is coded at a first frame rate, and upper layer coding in which the motion picture sequence is coded at a second frame rate higher than the first rate, are performed. In decoding the lower layer, only the lower layer of the first frame rate is decoded, and in decoding the upper layer, the lower layer and the upper layer of the second frame rate are decoded, and the upper and lower layers are superimposed. The picture coding and decoding apparatus includes a synthesizing unit for synthesizing, when there is not a lower layer corresponding to a frame position same as that of an upper layer in decoding, the non-existing lower layer frame by using first and second lower layers preceding and succeeding the frame position. The synthesizing unit includes an encoder for encoding, in an upper layer, a first area shape preceding in time of the lower layer and a second area shape succeeding in time, and a synthesizer for synthesizing using the first and second area shapes.
At the time of synthesizing the lower layer frame which has not been coded, the first area shape of the lower layer preceding in time and the second area shape of the lower layer succeeding in time are decoded in the upper layer, and synthesizing is performed using the first and second area shapes. Therefore, even when the area shape changes with time, there is not a distortion in the superimposed image of the lower and upper layers, and hence an image of good quality can be obtained.
Preferably, when there is not a lower layer frame corresponding to the same frame position as the upper layer at the time of decoding, coding of the first and second area shapes is not performed, and the first and second area shapes are extracted from coded data of one of or both of the lower and upper layers.
In synthesizing the lower layer frame, the first area shape of the lower layer preceding in time and the second area shape of the lower area succeeding in time are not coded, but the first and second area shapes are extracted from the decoded data of one of or both of the lower and upper layers. Accordingly, encoding of the area shape of the upper layer is unnecessary, and hence the number of bits can be reduced.
Preferably, a first flag indicating whether pixel information of an upper layer is to be coded or not at the time of coding the upper layer is provided, and a situation where only the area shape is coded in the upper layer and a situation where both the area shape and pixel information are coded can be identified by the decoding apparatus based on the first flag. As a result, it can be readily known by the decoding apparatus how the coding was performed.
More preferably, when there is not a lower layer frame at a frame position corresponding to that of an upper layer and area shapes of lower layers preceding and succeeding in time are to be extracted, a lower layer decoded image is divided and, utilizing the result of division, the area shapes are extracted.
As a result, the area shapes can be obtained accurately without increasing the number of bits.
More preferably, when there is not a lower layer frame at a frame position corresponding to that of an upper layer and area shapes of lower layers preceding and succeeding in time are to be extracted, the area shapes are presumed and extracted using an area shape obtained at the time of decoding the upper layer.
Therefore, the area shapes can be obtained readily without increasing the number of bits.
More preferably, there is provided a second flag indicating, when there is not a lower layer frame corresponding to the frame position of the upper layer at the time of decoding, whether the lower layer frame is to be synthesized using preceding and succeeding lower layers, and if synthesization of the lower layer frame is not performed, the preceding or the succeeding lower layer frame is used as the synthesized lower layer frame. This enables reduction of processing necessary for synthesizing.
More preferably, a third flag indicating whether a first area shape of a lower layer preceding in time is to be coded or not, and a fourth flag indicating whether a second area shape of a lower layer succeeding in time is to be coded or not, in synthesizing the lower layer frame, are provided.
When neither the first area shape nor the second area shape is coded, area shapes used for synthesization last time are used as area shapes for synthesization this time, when the second area shape only is to be coded, the second area shape used for synthesization last time is used as the first area shape for synthesization this time, and there is not a situation where only the first area shape is coded.
Since the area shape is not coded in the lower layer, a large number of bits are never generated in the lower layer. Accordingly, a large distortion is not generated even in a memory transmitting at a relatively low bit rate transmitting or storing the lower layer, and good lower layer image can be transmitted or stored.
According to another aspect of the present invention, in the motion picture coding apparatus for coding the motion picture, the motion picture includes a plurality of component motion pictures or images for constituting the motion picture. The motion picture coding apparatus includes a reference image identification number coding unit for coding an identification number of a reference component image used for prediction coding, a reference image selecting unit for selecting a reference image out of a plurality of component images in accordance with the identification number, and an image quality improving unit for improving image quality of the coded component image indicated by the identification number.
In the motion picture coding apparatus for coding a plurality of component motion pictures, a reference component image identification number used for prediction coding is coded, a reference image is selected out of a plurality of component images in accordance with the identification number, and image quality of the coded component image indicated by the identification number can be improved. Therefore, coded data with hierarchy of image quality can be formed.
Preferably, the reference image identification number coding unit sets a flag off when the identification number indicates a component image which is being coded, sets the flag on when the identification number indicates a component image which is different from a component image which is being coded, codes the flag only when the flag is off, and codes the flag and the identification number when the flag is on. As a result, the number of bits necessary for coding the identification number can be reduced.
Preferably, the reference image identification number coding unit sets a flag off when the identification number is not changed from a previous frame, sets the flag on when the identification number is changed from the previous frame, codes the flag only when the flag is off, and codes the flag and identification number when the flag is on. Therefore, the number of bits necessary for coding the identification number can be reduced.
Preferably, the motion image coding apparatus includes a comparing unit for comparing an identification number of a reference image with an identification number of a component image which is being coded, a flag generating unit for generating an off flag when the identification number of the reference image is the same as the identification number of the component image which is being coded and generating an on flag when the identification numbers are different from each other, and a flag coding and reference image identification number coding unit for coding the flag only when the flag is off and coding both the flag and the identification number of the reference image when the flag is on. Therefore, the number of bits necessary for coding the identification number can be reduced.
More preferably, the motion picture coding apparatus includes a memory for storing a reference image identification number of a preceding frame, a comparing unit for comparing a reference image identification number of the present frame with the reference image identification number of the preceding frame read from the memory, a flag generating unit for generating an off flag when reference image identification numbers of the preceding frame and present frame are the same and generating an on flag when the numbers are different, and a flag coding and reference image identification number coding unit for coding the flag only when the flag is off and for coding both the flag and reference image identification number of the present frame when the flag is on. As a result, the number of bits necessary for coding the identification number can be reduced.
More preferably, the flag is a 1 bit signal. Since coding determination is possible by only one bit of signal, a motion picture coding apparatus having simple structure can be provided.
According to a still further aspect of the present invention, the motion picture decoding apparatus for decoding the data coded by the motion picture coding apparatus described above includes a reference image identification number decoding unit for decoding an identification number of a reference component image for prediction coding, a reference image selecting unit for selecting a reference image out of a plurality of component images in accordance with the identification number, and an image quality improving unit for improving image quality of already decoded component image. Since the motion picture decoding apparatus includes the above described components, hierarchical decoding can be implemented. Therefore, it is possible to perform edition of component images efficiently by using low quality component images only, or to hierarchically improve image quality of a selected area of the motion picture, for example.
Preferably, the reference image identification number decoding unit decodes a flag among coded data of the identification number, regards the number of component image being decoded as the identification number when the flag is off, and decodes coded data of the identification number when the flag is on. Therefore, the data coded by the above-described motion picture coding apparatus can be decoded.
Preferably, the reference image identification number coding unit of the motion picture decoding apparatus decodes the flag among the coded data of the identification number, regards the reference image identification number used in a preceding frame as the present reference image identification number when the flag is off, and decodes coded data of the identification number when the flag is on. As a result, the data coded by the above-described motion picture coding apparatus can be decoded.
More preferably, the motion picture decoding apparatus includes a flag decoding unit for decoding a flag among coded data, a reference image identification number decoding unit for decoding the reference image identification number among the coded data, and regards the identification number of the component image which is being coded as the identification number of the reference image when the decoded flag is off, and regards result of decoding by the reference image identification number decoding unit as the reference image identification number when the flag is on.
More preferably, the motion picture decoding apparatus includes a flag decoding unit for decoding a flag among the coded data, a memory for storing a reference image identification number of a frame, and a reference image identification number decoding unit for decoding the reference image identification number among the coded data, regards the reference image identification number read from the memory as the identification number of the reference image of the present frame when the decoded flag is off, and regards the result of decoding by the reference image identification number decoding unit as the reference image identification number of the present frame when the flag is on.