The present invention relates to a video image decoding and composing method which can realize interactive operation by a user and a video image decoding and composing apparatus, and also a video image composition information coding method.
Up to now, video image compression methods such as MPEG1, or MPEG2 have been utilized in the coding of natural video images. Further, there is an object coding method which divides a moving video image into objects in the video image and encodes the video images for each object and each background, as a new coding method.
FIG. 7 is a conceptual view showing composition of object video images. In the figure, numeral 701 represents a background video image, and numerals 702 to 705 designate video images of objects in the video images. Numerals 706 and 707 designate composed images. As the objects, there are rectangular shape object images such as background image 701 and arbitrary shape objects having arbitrary shapes other than a rectangular shape such as objects 702 to 705. The arbitrary shape object is constituted by a texture video image showing a color signal and a shape video image showing the shape. The objects of the video images 704, 705 are supposed to be located in the foreground relative to the objects of the video images 702, 703.
First of all, the background video image 701 and the texture video image 702 of the object which is the closest to the background are composed using the shape video image 703, thereby to output the composed video image 706. Next, the texture video image 704 is composed with the composed video image 706 using the shape video image 705, thereby to output the composed video image 707. By these operations, the composed video image is produced. Herein, as shape video images, there is a two-value video image which only indicates whether it is inside or outside of the object, and a video image which indicates the ratio of composition of the pixel values of the background to that of the object by multi-values, thereby enabling semi-transparent composition. In the object coding method, the rectangular shape object video image (701), the arbitrary shape object video image (702, 703), and the arbitrary shape object video image (704, 705) can be individually coded for each object. In the MPEG-4 video image coding method, it is possible to perform encoding of such object image having arbitrary shapes other than the rectangular shape.
On the other hand, even in the computer graphics data, the standardization of coding methods has been advancing. As a standard coding method, there is a virtual reality modeling language. In this coding method, information of a top, a line, and a surface, and their materials (such as color, light reflection parameter) can be coded, and it is possible in the decoding apparatus to re-structure the scene of computer graphics by decoding the coded signal of the virtual reality modeling language.
Recently, a coding method comprising combination of the object coding method and the computer graphics data coding method has also attracted attention. When the computer graphics data coding method is extended to the composition of the object video image, changing of the composition positions and the composition with the computer graphics images, of the object images of the object coding method can be performed.
In MPEG-4, there is also realized a coding method comprising a combination of the above-described object coding method and the computer graphics data coding method. By extending the computer graphics data coding method to the composition of the object video image, it is possible to compose the object video image of the object coding method with the computer graphics video image. Thereby, it is possible to realize the computer graphics of higher presentation ability relative to the prior art.
FIG. 8 shows an example of a video image decoding and composing apparatus comprising a combination of the object coding method and the computer graphics data coding method. Hereinafter, a coded signal describing the composition information of the object video image as the above-described extended computer graphics data coded format is called a composition information coded signal.
Numeral 801 designates a composition information coded signal, numeral 802 designates a composition information coded signal decoding means for analyzing the composition information coded signal 801 and outputting the result as composition information, numeral 803 designates a composition information memory storing the composition information as the output of the composition information coded signal decoding means 802, numeral 804 designates a coded signal of the arbitrary shape object video image, numeral 805 designates an arbitrary shape object decoding means for decoding the coded signal 804, and numeral 806 designates a shape memory storing a shape video image signal which is decoded by the arbitrary shape object decoding means 805. Numeral 807 designates a texture memory for storing a texture video image signal which is decoded by the arbitrary shape object decoding means 805. Numeral 808 designates a coded signal of the rectangular shape object video image and numeral 809 designates a rectangular shape object decoding means for decoding the coded signal 808. Numeral 810 designates a video image memory for storing the video image signal which is decoded by the rectangular shape object decoding means 809. Numeral 811 designates a composing means for composing the shape signal stored in the shape memory 806, the texture signal stored in the texture memory 807, and the video image signal stored in the video image memory 810 in accordance with the composition information stored in the composition information memory 803. Numeral 812 designates a composition video image signal which is output from the composing means 811.
The operation of the video image decoding and composing apparatus constructed as described above will be described with reference to the drawings and the tables which follow.
An example of the composition information coded signal 801 is shown in Table 4. This is described in a format similar to that of the virtual reality modeling language. For the detail of the format of the virtual reality modeling language, please see xe2x80x9cVRML2.0-3D cyber space structuring language-xe2x80x9d by Kenjiro Miura, Asakura Shoten, 1996. In the format, a node and a field accompanying the node are included. In this example, Group, Shape, Appearance, MovieTexture, and Rectangle objects are nodes. The xe2x80x9cchildrenxe2x80x9d is a field of Group node, the xe2x80x9cappearancexe2x80x9d and xe2x80x9cgeometryxe2x80x9d are fields of Shape node, the xe2x80x9ctexturexe2x80x9d is a field of Appearance node, and url is a field of MovieTexture node. The Group node represents a group of nodes, and describes the collection of nodes at the xe2x80x9cchildrenxe2x80x9d field. The xe2x80x9cMovieTexturexe2x80x9d node represents a moving video image which is to be texture mapped to the object (in this example, Rectangle node) which is represented by the xe2x80x9cgeometryxe2x80x9d field of the Shape node, and the location of the coded video image signal corresponding to the moving video image is described in the url field.
The composition information coded signal may be compressed in the text format as shown in Table 4, or may be further compressed in the binary format as in MPEG-4.
The arbitrary shape object decoding means 805 inputs and decodes the arbitrary shape coded signal 804, and the decoded shape video image is stored in the shape memory 806 and the texture video image in the texture memory 807, respectively. The rectangular shape object decoding means 809 inputs and decodes the rectangular object coded signal 808, and the decoded video image is stored in the video image memory 810. The composing means 811 composes the texture video image of the arbitrary shape object stored in the texture memory 807 and the rectangular shape video image stored in the video image memory 810 in accordance with the composition information stored in the composition information memory. When the arbitrary shape video image is composed, the shape video image in the shape memory 806 is used. The composition means 811 outputs the composed video image signal as the composed video image signal 812. It is possible for the composer of the coded data to perform the composition of the decoded video image object freely by using the composition information coded signal.
On the other hand, there is an attempt that a user of the display apparatus should perform an interactive operation for the display object by the computer graphics. In the above-described virtual reality modeling language, an interactive operation for the computer graphics object is realized.
However, up to now, while the interactive operation for the object of computer graphics has been devised, the interactive operation by the user for the arbitrary shape object video image in the object coding has not been conceived. For example in the case shown in Table 4, when the user intends to select the arbitrary shape object video image by an operation with such as a mouse, the shape information of the object video image is not considered. Therefore, the object video image may be erroneously selected although its position is out of the shape.
It is an object of the present invention to easily realize an interactive operation by the user in a decoding and composing apparatus of an arbitrary shape object.
According to a first aspect of the present invention, a video image decoding and composing method for decoding a video image coded signal having shape information in shape units and composing the decoded signals together to decode an object video image, uses a shape video image obtained by decoding the video image coded signal in detecting an object video image relating to the object existing at a predetermined position on a video screen.
According to a second aspect of the present invention, a video image decoding and composing method for decoding a video image coded signal having shape information in shape units and composing the decoded signals together to decode an object video image, composes only respective shapes video image obtained by decoding the object video image as a video image signal, together.
According to a third aspect of the present invention, in the video image decoding and composing method as defined in the second aspect of the invention, said shape video images are composed after being geometrically transformed.
According to a fourth aspect of the present invention, in the video image decoding and composing method as defined in the second aspect of the invention, said shape video images are composed after being assigned colors in the shapes.
According to a fifth aspect of the present invention, in the video image decoding and composing method as defined in the second aspect of the invention, said shape video images are composed after textures being mapped to the shape video images.
According to a sixth aspect of the present invention, a video image decoding and composing method for decoding a video image coded signal having shape information thereby to decode object video images in shape units, comprises a composition information coded signal indicating shape video images which are obtained when the video images are decoded being input, and a plurality of the video images being composed in accordance with said composition information coded signal.
According to a seventh aspect of the present invention, a video image decoding and composing apparatus comprises a composition information coded signal decoding means for decoding a composition information coded signal; a composition information memory for storing composition information as the output of said composition information coded signal decoding means; an arbitrary shape object decoding means for decoding a coded signal of an object video image having an arbitrary shape other than a rectangular shape; a shape memory for storing a shape signal which is decoded by the arbitrary shape object decoding means; a texture memory for storing a texture signal which is decoded by the arbitrary shape object decoding means; a rectangular shape object decoding means for decoding a coded signal of a rectangular shape object video image; a video image memory for storing a video image signal which is decoded by the rectangular shape object decoding means; a composing means for composing the shape signal stored in the shape memory, the texture signal stored in the texture memory, and the video image signal stored in the video image memory in accordance with the composition information stored in the composition information memory; a position indicating means for indicating a position in a composed image which is output from the composing means; and a shape selecting means for selecting only the shape signal according to the instruction from the composing means.