The present invention relates to technical field of a digital image processing and especially relates to a moving image encoding apparatus for efficiently encoding image data, and a moving image decoding apparatus for decoding the encoded data produced by the moving image encoding apparatus.
In the image encoding, a method of synthesizing different moving image sequences has been studied.
In a literature, xe2x80x9cImage Encoding Using a Hierarchical Expression and Multiple Templatesxe2x80x9d (Shingaku Giho IE94-159, pp. 99-106 (1995)), there is described a method in which a moving image sequence as a background and a moving image sequence of a partial moving image as a foreground (for example, human image, picture of fish and the like cut down by a Chroma key technique) are superimposed to prepare a new sequence.
In addition, in a literature, xe2x80x9cTemporal Scalability based on Image Contentxe2x80x9d, (ISO/IEC/JTC1/SC29/WG11 MPEG95/211 (1995)), there is described a method in which a moving image sequence of a partial moving image having a high frame rate is superimposed on a moving image sequence having a low frame rate to prepare a new sequence.
With this method, as shown in FIG. 15, prediction-encoding is performed at a low frame rate in a lower layer, and prediction-encoding is performed at a high frame rate only for a selected area (dotted portion) in an upper layer. In this case, an image frame decoded up to the upper layer is obtained by superimposing an image frame decoded by the lower layer and an area decoded by the upper layer. Moreover, a frame encoded by the lower layer is not encoded in the upper layer, and the decoded image of the lower layer is directly copied. In addition, it is assumed that a portion which attracts the audience""s attention, such as a personality portion is selected as the selected area.
FIG. 11 shows a block diagram of the conventional art. On the encoding side in the conventional art, the input moving image layer is thinned out between frames by a first thinning-out section 1101 and a second thinning-out section 1102, and made to be the frame rate or less of the input image, then input to an upper layer encoding section and a lower layer encoding section. Here, the frame rate in the upper layer is assumed to be higher than the frame rate of the lower layer.
In the lower layer encoding section 1104, the entire input moving image is encoded. As the encoding method, an international standard method for encoding moving images, for example, MPEG or H. 261 is used. In the lower layer encoding section 1104, decoded image of the lower layer is prepared, and input to a superimposing section 1105 upon being utilized for prediction-encoding.
In the upper layer encoding section 1103, only a selected area of the input moving image is encoded. Here, the international standard method for encoding moving images such as MPEG and H.261 is again used, but only the selected area is encoded based on the area information. However, the frame encoded in the lower layer is not encoded in the upper layer. The area information is the information showing the selected area of, for example, a personality portion, and is a binarized image which takes value 1 at a position in the selected area and takes value 0 at a position other than the selected area. Also in the upper layer encoding section 1103, only the selected area of the moving image is decoded, and input to the superimposing section 1105.
In an area information encoding section 1106, the area information is encoded by utilizing a chain code or the like.
The superimposing section 1105 outputs a decoded image of the lower layer, when the lower layer frame has been encoded in the frame to be superimposed. When the lower layer frame has not been encoded in the frame to be superimposed, the superimposing section 1105 outputs a moving image by using two decoded image of the lower layer before and behind the frame to be superimposed and one decoded image of the upper layer. The two image frames of the lower layer are before and behind the upper layer frame. The moving image prepared here is input to the lower layer encoding section 1103 and utilized for the prediction-encoding. The image forming method in the superimposing section 1105 is as described below.
First, two interpolated images of the lower layer are prepared. If it is assumed that the decoded image of the lower layer at time xe2x80x9ctxe2x80x9d is B(x, y, t)(provided that x and y are coordinates representing a position of a pixel in the space), and that the time of the two frames of the lower layer are t1 and t2, respectively, and the time of the upper layer is t3 (provided that t1 less than t3 less than t2), the interpolated image at time t3 (x, y, t3) can be calculated by the following expression (1):
I(x, y, t3)=[(t2xe2x88x92t3)B(x, y, t1)+(t3xe2x88x92t1)B(x, y, t2)]/(t2xe2x88x92t1)xe2x80x83xe2x80x83(1)
Then, a decoded image E of the upper layer is superimposed on the interpolated image I determined by the above expression (1). For this purpose, weight information W (x, y, t) for interpolation is prepared from the area information M (x, y, t), to obtain a superimposed image S by the following expression (2):
S(x, y, t)=[1xe2x88x92W(x, y, t)]I(x, y, t)+E(x, y, t)W(x, y, t)xe2x80x83xe2x80x83(2)
Here, the area information M (x, y, t) is a binarized image which takes 1 within the selected area and takes 0 outside the selected area, and by applying a low-pass filter to this image for plural times, the weight information W (x, y, t) can be obtained. That is to say, the weight information W (x, y, t) takes 1 within the selected area, takes 0 outside the selected area, and takes 0 to 1 in the boundary of the selected area. The above-mentioned description is for the image forming method in the superimposing section 1105. The encoded data encoded in the lower layer encoding section, the upper layer encoding section, and the area information encoding section is integrated in an encoded data-integrating section (not shown) and transmitted or accumulated.
Then, on the decoding side in the conventional art, the encoded data is disintegrated into an encoded data of the lower layer, an encoded data of the upper layer and an encoded data of the area information by an encoded data-disintegrating section (not shown). These encoded data is decoded by a lower layer decoding section 1108, an upper layer decoding section 1107 and an area information decoding section 1109, as shown in FIG. 11.
A superimposing section 1110 on the decoding side comprises the same apparatus as that of the superimposing section 1105 on the encoding side, and an image is superimposed in the same method as described in the description on the encoding side, using a lower-layer decoded image and an upper-layer decoded image. The moving image superimposed here displayed on a display, as well as being input to the upper layer decoding section 1107, and utilized for the prediction of the upper layer. Though a decoding apparatus for decoding both the lower layer and the upper layer has been described here, if it is a decoding apparatus having only a decoding section of the lower layer, the upper layer encoding section 1107 and the superimposing section 1110 are not required, hence a part of the encoded data can be reproduced with a small hardware scale.
At this time, since the frame rates of the lower layer and the upper layer are different, it is necessary to synthesize the lower layer corresponding to the upper layer from the lower layer frames temporally before and after of the lower layer frame. However, when an output image is obtained from two lower-layer decoded images and one upper-layer decoded image, the output image is synthesized by the interpolation of two lower layer frames, therefore when the position of the selected area changes with time, a big distortion is caused in the periphery of the selected area, resulting in a big deterioration of the image quality.
This problem can be solved by using a method such as the one described in a literature xe2x80x9cTemporal Scalability algorithm based on image contentxe2x80x9d, ISO/IEC/JTC1/SC29/WG11 MPEG96/0277 (1996). FIG. 14 illustrates a method for solving this problem, shown in the above-mentioned literature. In FIG. 14a, images A and C are two encoded images of the lower layer and an image B is an encoded image of the upper layer, the temporal order of display is an order of A, B and C. The selected area is shown by hatching.
Moreover, since only the selected area is encoded in the upper layer, outside of the selected area is shown by broken line. Since the selected area moves in the direction of an arrow in the figure, the interpolated image obtained by the image A and the image C becomes the one in which two selected areas are superposed, as shown in meshed portion in FIG. 14b. Furthermore, when the image B is superimposed by using the expression (2), the output image becomes an image in which three selected areas are superposed, as shown in FIG. 14c. 
Particularly in the periphery (outside) of the selected area of the upper layer, the selected area of the lower layer appears like an afterimage to deteriorate the image quality widely. As for the entire moving image, when only the lower layer is displayed, the above-mentioned distortion does not appear, and when the superimposed image of the upper layer and the lower layer is displayed, the above-mentioned distortion appears, hence distortion like flicker appears, resulting in a big deterioration in the image quality. However, since the meshed portion on the left side of FIG. 14c can be obtained from the image C, and the meshed portion on the right side of FIG. 14c can be obtained from the image A, the above-mentioned distortion can be dissolved by using the lower layer synthesized as described above.
FIG. 12 shows a block diagram of a conventional image superimposing apparatus shown in the above-mentioned literature. A first area-extracting section 1201 in FIG. 12 extracts an area which is the first area and is not the second area, from the first area information of the lower layer and the second area information of the lower layer. In FIG. 13a, if it is assumed that the first area information is expressed by a dotted line (it is assumed that the inside of the dotted line has a value 0 and the outside of the dotted like has a value 1), and the second area information is similarly expressed by a broken like, the area extracted by the first area-extracting section 1201 becomes the hatched portion of FIG. 13a. 
The second area-extracting section 1202 in FIG. 12 extracts an area which is the second area and is not the first area, from the first area information of the lower layer and the second area information of the lower layer. In the case of FIG. 13a, the meshed portion is extracted.
A controller 1203 in FIG. 12 is a section for controlling a switch 1204 with an output of the first area-extracting section 1201 and the second area-extracting section 1202. That is to say, when the position of a target pixel is only in the first area, the switch 1204 is connected to the second decoded image side, and when the position of the target pixel is only in the second area, the switch 1204 is connected to the first decoded image side, and when the position of the target pixel is in other areas, the switch 1204 is connected to the output from the interpolated image-forming section 1205.
The interpolated image-forming section 1205 in FIG. 12 calculates the interpolated image of the first decoded image of the lower layer and the second decoded image of the lower layer, according to the expression (1). Provided that in the expression (1), B (x, y, t1) is the first decoded image, B (x, y, t2) is the second decoded image, I (x, y, t3) is the interpolated image, wherein t1, t2 and t3 are the time of the first decoded image, of the second decoded image and of the interpolated image, respectively.
An image is formed as described above, therefore, in the case of FIG. 13a, for example, since the second decoded image is used in the hatched portion, a background pixel outside of the selected area appears, and in the meshed portion, since the first decoded image is used, a background pixel outside of the selected area appears, and in other portions, the interpolated image of the first decoded image and the second decoded image appears.
A decoded image of the upper layer is superimposed on the thus formed image by a weighted average section 1206 in FIG. 12, hence the superimposed image does not have an afterimage in the selected area (meshed portion) as shown in FIG. 13b, and an image having little distortion can be obtained. The weighted average section 1206 in FIG. 12 superimposes the above-mentioned synthesized image and the decoded image of the upper layer by a weighted average.
However, with the conventional apparatus, there are problems as described below.
First, when the degree of shape change due to the movement of the parts area is small, large improvement in the image quality cannot be expected with the conventional art, and since it is required to encode two shape information before and behind of the parts shape of the upper layer, the amount of codes which can be used for the texture information encoding decreases relatively, hence there is such a problem that the image quality deteriorates (the first problem).
Secondly, the conventional art has an effect when the parts area moves in one direction, but when the parts image makes reciprocating movement, the background information of the parts cannot be obtained in principle, hence there is such a problem that the image quality is not improved (the second problem).
FIG. 8 and FIG. 10 are diagrams for explaining this problem. For example, the background image of the area where the parts image areas in images A and C of FIG. 10 overlap (hatched area in FIG. 8) cannot be obtained from the image A and the image C.
Furthermore, with the conventional art, lower layer frames temporally before and after with respect to the image of the upper layer are required, but there may be a case in which one lower layer frame does not exist at the beginning or at the end of the image sequence, or before and after the scene change. Therefore, there is such a problem that the image quality is not improved in the vicinity of the parts image (the third problem).
Furthermore, the conventional art requires to change over the interpolation processing selectively for each four area, thus there is such a problem that the processing becomes complicated (the fourth problem).
It is an object of the present invention to solve these problems and to provide a moving image encoding apparatus and a moving image decoding apparatus which do not deteriorate the quality of the decoded image, while reducing data quantity after encoding without deteriorating the quality of the decoded image.
In view of the above situation, it is an object of the present invention to provide a moving image encoding apparatus and a moving image decoding apparatus which can reduce data quantity after encoding without deteriorating the quality of the decoded image.
With a view to solving the above problems, the gist of the present invention is as follows.
The first gist of the present invention is a moving image encoding apparatus which separates one moving image sequence to a lower layer having a low frame rate and an upper layer having a high frame rate, encodes a shape of a parts area for synthesizing the lower layer, on the condition that there is no frame corresponding to the upper layer, and encodes the upper layer by prediction, wherein
when the parts area of the lower layer appearing as a background is larger than a predetermined threshold, the moving image encoding apparatus encodes the shape of the parts area, and synthesizes a frame obtained by taking the average by weighting the lower layer and a frame of the lower layer to generate image information, and when the parts area of the lower layer appearing as a background is smaller than the predetermined threshold, the moving image encoding apparatus does not encode the shape of the parts area and generates image information by the weighted average of the lower layer.
The second gist of the present invention is a moving image decoding apparatus which synthesizes a lower layer having a low frame rate on the condition that there is no frame corresponding to an upper layer having a high frame rate, decodes the upper layer by prediction, and superimposes the prediction-decoded upper layer on the lower layer to decode them into one moving image sequence, wherein
when the shape of parts area has been encoded, the moving image decoding apparatus decodes the shape of the parts area, and synthesizes a frame obtained by taking the average by weighting the lower layer and a frame of the lower layer to generate image information, and when the shape of parts area has not been encoded, the moving image decoding apparatus generates image information by the weighted average of the lower layer.
The third gist of the present invention is a moving image encoding apparatus which separates one moving image sequence to a lower layer having a low frame rate and an upper layer having a high frame rate, encodes a shape of a parts area for synthesizing the lower layer on the condition that there is no frame of the lower layer corresponding to the upper layer, and encodes the upper layer by prediction, wherein
the moving image encoding apparatus interpolates a pixel value within the overlapping area of the lower layer, using a pixel value in the periphery of the area appearing as a background, to generate image information.
The 4th gist of the present invention is a moving image decoding apparatus which synthesizes a lower layer having a low frame rate on the condition that there is no lower layer frame corresponding to an upper layer having a high frame rate, decodes the upper layer by prediction, and superimposes the prediction-decoded upper layer on the lower layer to decode them into one moving image sequence, wherein
the moving image decoding apparatus interpolates a pixel value within the overlapping area, using a pixel value in the periphery of the overlapping area of the parts area of the lower layer appearing as a background, to generate image information.
The 5th gist of the present invention is a moving image encoding apparatus according to the first gist, wherein the pixel value within the parts area is interpolated by using a pixel value in the periphery of the parts area of the lower layer, and image information is generated by using the interpolated lower layer frame.
The 6th gist of the present invention is a moving image decoding apparatus according to the second gist, wherein the pixel value within the parts area is interpolated by using a pixel value in the periphery of the parts area of the lower layer, and image information is generated by using the interpolated lower layer frame.
The 7th gist of the present invention is a moving image encoding apparatus according to the first gist, wherein in the case where the number of frames of the lower layer required for the synthesis of the lower layer is not satisfied,
the image information is generated by using a frame obtained by interpolating the parts area of the lower layer.
The 8th gist of the present invention is a moving image decoding apparatus according to the second gist, wherein in the case where the number of frames of the lower layer required for the synthesis of the lower layer is not satisfied,
the image information is generated by using a frame obtained by interpolating the parts area of the lower layer.
The 9th gist of the present invention is a moving image encoding apparatus according to the first gist, wherein in the case where there are a plurality of frames of the upper layer between two adjacent frames of the lower layer,
when the parts area of the lower layer appearing as a background of any one of the plurality of frames of the upper layer is larger than the predetermined threshold, the shape of the parts area for synthesizing the lower layer frame is encoded with respect to the plurality of the upper layer frames.
The 10th gist of the present invention is a moving image encoding apparatus according to the third gist, wherein in the case where there are a plurality of frames of the upper layer between two adjacent frames of the lower layer,
when the parts area of the lower layer appearing as a background of any one of the plurality of frames of the upper layer is larger than the predetermined threshold, the shape of the parts area for synthesizing the lower layer frame is encoded with respect to the plurality of the upper layer frames.
The 11th gist of the present invention is a moving image encoding apparatus according to the 5th gist, wherein in the case where there are a plurality of frames of the upper layer between two adjacent frames of the lower layer,
when the parts area of the lower layer appearing as a background of any one of the plurality of frames of the upper layer is larger than the predetermined threshold, the shape of the parts area for synthesizing the lower layer frame is encoded with respect to the plurality of the upper layer frames.
The 12th gist of the present invention is a moving image encoding apparatus according to the 7th gist, wherein in the case where there are a plurality of frames of the upper layer between two adjacent frames of the lower layer,
when the parts area of the lower layer appearing as a background of any one of the plurality of frames of the upper layer is larger than the predetermined threshold, the shape of the parts area for synthesizing the lower layer frame is encoded with respect to the plurality of the upper layer frames.
The 13th gist of the present invention is a moving image decoding apparatus according to the second gist, wherein in the case where there are a plurality of frames of the upper layer between two adjacent frames of the lower layer,
when the shape of the parts area for synthesizing the frames of the lower layer is encoded with respect to any one of the plurality of frames of the upper layer, a frame obtained by taking the average by weighting the lower layer and a frame of the lower layer are synthesized with respect to all of the plurality of frames of the upper layer to generate image information.
The 14th gist of the present invention is a moving image decoding apparatus according to the 4th gist, wherein in the case where there are a plurality of frames of the upper layer between two adjacent frames of the lower layer,
when the shape of the parts area for synthesizing the frames of the lower layer is encoded with respect to any one of the plurality of frames of the upper layer, a frame obtained by taking the average by weighting the lower layer and a frame of the lower layer are synthesized with respect to all of the plurality of frames of the upper layer to generate image information.
The 15th gist of the present invention is a moving image decoding apparatus according to the 6th gist, wherein in the case where there are a plurality of frames of the upper layer between two adjacent frames of the lower layer,
when the shape of the parts area for synthesizing the frames of the lower layer is encoded with respect to any one of the plurality of frames of the upper layer, a frame obtained by taking the average by weighting the lower layer and a frame of the lower layer are synthesized with respect to all of the plurality of frames of the upper layer to generate image information.
The 16th gist of the present invention is a moving image decoding apparatus according to the 8th gist, wherein in the case where there are a plurality of frames of the upper layer between two adjacent frames of the lower layer,
when the shape of the parts area for synthesizing the frames of the lower layer is encoded with respect to any one of the plurality of frames of the upper layer, a frame obtained by taking the average by weighting the lower layer and a frame of the lower layer are synthesized with respect to all of the plurality of frames of the upper layer to generate image information.
The 17th gist of the present invention is a moving image encoding apparatus according to the first gist, wherein
when the parts area of the lower layer appearing as a background is smaller than the predetermined threshold, the parts area of the upper layer is expanded by using the parts area of the lower layer to generate a parts area of image information.
The 18th gist of the present invention is a moving image encoding apparatus according to the third gist, wherein
when the parts area of the lower layer appearing as a background is smaller than the predetermined threshold, the parts area of the upper layer is expanded by using the parts area of the lower layer to generate a parts area of image information.
The 19th gist of the present invention is a moving image encoding apparatus according to the 5th gist, wherein
when the parts area of the lower layer appearing as a background is smaller than the predetermined threshold, the parts area of the upper layer is expanded by using the parts area of the lower layer to generate a parts area of image information.
The 20th gist of the present invention is a moving image encoding apparatus according to the 7th gist, wherein
when the parts area of the lower layer appearing as a background is smaller than the predetermined threshold, the parts area of the upper layer is expanded by using the parts area of the lower layer to generate a parts area of image information.
The 21st gist of the present invention is a moving image encoding apparatus according to the 9th gist, wherein
when the parts area of the lower layer appearing as a background is smaller than the predetermined threshold, the parts area of the upper layer is expanded by using the parts area of the lower layer to generate a parts area of image information.
The 22nd gist of the present invention is a moving image encoding apparatus according to the 10th gist, wherein
when the parts area of the lower layer appearing as a background is smaller than the predetermined threshold, the parts area of the upper layer is expanded by using the parts area of the lower layer to generate a parts area of image information.
The 23rd gist of the present invention is a moving image encoding apparatus according to the 11th gist, wherein
when the parts area of the lower layer appearing as a background is smaller than the predetermined threshold, the parts area of the upper layer is expanded by using the parts area of the lower layer to generate a parts area of image information.
The 24th gist of the present invention is a moving image encoding apparatus according to the 12th gist, wherein
when the parts area of the lower layer appearing as a background is smaller than the predetermined threshold, the parts area of the upper layer is expanded by using the parts area of the lower layer to generate a parts area of image information.
The 25th gist of the present invention is a moving image decoding apparatus according to the second gist, wherein when the frames of the lower layer are synthesized, a pixel value of one of the lower layer frames which exist front and behind of a frame of the lower layer is used, with respect to an area where the first parts area and the second parts area overlap, or an area which is neither the first parts area nor the second parts area.
The 26th gist of the present invention is a moving image decoding apparatus according to the 6th gist, wherein when the lower layer frames are synthesized, a pixel value of one of the lower layer frames which exist front and behind of a frame of the lower layer is used, with respect to an area where the first parts area and the second parts area overlap, or an area which is neither the first parts area nor the second parts area.
The 27th gist of the present invention is a moving image decoding apparatus according to the second gist, wherein when the lower layer frame is synthesized, a pixel value of one of the lower layer frames which exist front and behind of a frame of the lower layer is used, with respect to an area where the first parts area and the second parts area overlap, or an area which is neither the first parts area nor the second parts area, and at the time of the synthesis, interpolation is performed by using a pixel value of a frame of the lower layer with respect to a pixel value outside of one parts area of the lower layer, and using a pixel value in the periphery of the parts area with respect to a pixel value inside of the one parts image of the lower layer.
The 28th gist of the present invention is a moving image decoding apparatus according to the 4th gist, wherein when the lower layer frame is synthesized, a pixel value of one of the lower layer frames which exist front and behind of a frame of the lower layer is used, with respect to an area where the first parts area and the second parts area overlap, or an area which is neither the first parts area nor the second parts area, and at the time of the synthesis, interpolation is performed by using a pixel value of a frame of the lower layer with respect to a pixel value outside of one parts area of the lower layer, and using a pixel value in the periphery of the parts area with respect to a pixel value inside of the one parts image of the lower layer.
The 29th gist of the present invention is a moving image decoding apparatus according to the 6th gist, wherein when the lower layer frame is synthesized, a pixel value of one of the lower layer frames which exist front and behind of a frame of the lower layer is used, with respect to an area where the first parts area and the second parts area overlap, or an area which is neither the first parts area nor the second parts area, and at the time of the synthesis, interpolation is performed by using a pixel value of a frame of the lower layer with respect to a pixel value outside of one parts area of the lower layer, and using a pixel value in the periphery of the parts area with respect to a pixel value inside of the one parts image of the lower layer.
The 30th gist of the present invention is a moving image decoding apparatus according to the 25th gist, wherein when the lower layer frame is synthesized, a pixel value of one of the lower layer frames which exist front and behind of a frame of the lower layer is used, with respect to an area where the first parts area and the second parts area overlap, or an area which is neither the first parts area nor the second parts area, and at the time of the synthesis, interpolation is performed by using a pixel value of a frame of the lower layer with respect to a pixel value outside of one parts area of the lower layer, and using a pixel value in the periphery of the parts area with respect to a pixel value inside of the one parts image of the lower layer.
The 31st gist of the present invention is a moving image decoding apparatus according to the 26th gist, wherein when the lower layer frame is synthesized, a pixel value of one of the lower layer frames which exist front and behind of a frame of the lower layer is used, with respect to an area where the first parts area and the second parts area overlap, or an area which is neither the first parts area nor the second parts area, and at the time of the synthesis, interpolation is performed by using a pixel value of a frame of the lower layer with respect to a pixel value outside of one parts area of the lower layer, and using a pixel value in the periphery of the parts area with respect to a pixel value inside of the one parts image of the lower layer.