1. Field of the Invention
The present invention relates to scene description generating apparatuses and methods for placing static image signals, moving image signals, and graphic data in a screen and for describing a new scene, to object extracting methods, and to recording media.
2. Description of the Related Art
FIG. 19 shows conventional scene description technology for placing static image signals, moving image signals, and graphic data in a screen and for describing a new scene. When input images and graphic data are to be displayed as a scene combining one or more input data, it is necessary to provide additional information for designating what the constructed scene will be. This additional information is referred to as a scene description (information). The scene description (information) is used to place a part (referred to as an xe2x80x9cobjectxe2x80x9d to be input in a scene. Referring to FIG. 19, an object A02 and an object A03 are displayed based on a scene description (information) A00, thus obtaining a scene A04. Although the two-dimensional scene description is illustrated by way of example in FIG. 19, there are cases in which a three-dimensional scene is displayed on a two-dimensional display device by describing the three-dimensional scene and projecting the scene onto a two-dimensional plane. When a scene combining one or more objects is represented based on a scene description, an entire screen A01 displaying an input static image or a moving image may be used. Alternatively, a desired portion of the scene may be separated as an object A02. This separation is referred to as segmentation.
FIG. 20 shows the structure of a conventional editing system for performing segmentation and generating a scene description. Image processing of an input image or graphic data is performed independently of generating the scene description. In an image processor B00, graphic data B01 is transformed to an object B04 by a segmentation unit B02. Segmentation may be performed by various methods including a chroma-key method for separating a background with a specific color component, a method for cutting the contour of an object based on the luminance level gradient, and a method for designating the contour by manual operation. A segmented object may be encoded by an encoder B03 indicated by a dotted line using, for example, an encoding system conforming to the ISO14496-2 standard. In contrast, a scene description processor B05 generates a scene description B07 based on a designation of what the constructed scene will be.
There are various types of scene description, including the ISO14496-1 standard MPEG-4 scene description, virtual reality modeling language (VRML) conforming to the ISO14772-1 standard, hypertext markup language (HTML) widely used in the Internet, and multimedia and hypermedia information coding expert group (MHEG) conforming to the ISO13522-5 standard.
Referring to FIGS. 21 to 23, the ISO14496-1 standard MPEG-4 scene description is illustrated by way of example to describe the structure, the contents, and an example of a scene description. FIG. 21 shows the structure of a scene description, FIG. 22 shows the contents of a scene description, and FIG. 23 shows an example of a scene. A scene description is represented by basic description units referred to as nodes. A node is a unit for describing an object, a light source, and an object""s surface characteristics, and includes data referred to as a field for designating node characteristics and attributes. For example, referring to FIG. 21, a xe2x80x9cTransform2Dxe2x80x9d node is a node capable of designating two-dimensional coordinate transformation, and includes a xe2x80x9ctranslationxe2x80x9d field shown in FIG. 22, designating placement, such as translation. There are fields that can designate other nodes. Hence, a scene description has a tree structure. When an object is to be placed in a scene, the scene description is grouped into a node representing the object and a node representing attributes, as shown in FIG. 22. The scene description is further grouped into a node representing placement. The contents of the scene description shown in FIG. 22 are described below. First, xe2x80x9cGroup{xe2x80x9d is a grouping node of an entire scene, and xe2x80x9cchildrenxe2x80x9d indicates the start of a description of a child node. The text xe2x80x9cTransform2Dxe2x80x9d is a grouping node for designating coordinate transformation, and xe2x80x9ctranslation x1 y1xe2x80x9d designates the placement position. The text xe2x80x9cchildren[xe2x80x9d indicates the start of a description of a child node to be placed, and xe2x80x9cShape{xe2x80x9d designates incorporation of an object into the scene. The text xe2x80x9cgeometry Bitmap{}xe2x80x9d indicates a scene object on which a texture image is to be displayed, xe2x80x9cappearance Appearance{xe2x80x9d designates a surface characteristic of the scene object, and xe2x80x9ctexture ImageTexture{url}xe2x80x9d designates an image object used as a texture. In accordance with the contents of the scene description, an image object is placed as shown in FIG. 23. An object indicated by the xe2x80x9cShapexe2x80x9d node is designated by the parent node, i.e., the xe2x80x9cTransform2Dxe2x80x9d node, to be translated. FIG. 23 shows an example of this. Referring to FIG. 23, an object in an input image is segmented every rectangular region containing the object by the segmentation unit B02 shown in FIG. 20. The object B04 is then placed in the scene based on a designation in the scene description B07 generated by the scene description generator B06.
Next, an image object encoding system is described using ISO14496-2 standard MPEG-4 Video by way of example. Referring to FIG. 24, an elliptical object D01 in an input image D00 is segmented from a background object D03, and the object D01 is encoded. When encoding the object D01, a region D02 including the object D01 is set. In MPEG-4 Video, a rectangular region is used. Outside the rectangular region is not encoded. Encoding is performed in small block units. Hereinafter a block is referred to as an encoding block. When an encoding block, such as an encoding block D05, does not include object data, the encoding block is required to encode only a flag representing xe2x80x9cthere is no data to be encodedxe2x80x9d. When an encoding block, such as an encoding block D06, includes both an object region and a region without an object, the pixel level of the region outside the object can be set to an arbitrary value and thus encoded. This is because the form (contour) of the object D01 is separately encoded, and data outside the object is ignored when decoding. In contrast, the background D03 is also an object. When encoding the background object D03, a rectangular region D04 including the object D03 is set. This rectangular region D04 covers an entire frame of the input image. The rectangular region D04 is encoded in the same manner as the object D01. Specifically, a shaded portion indicates an object to be encoded. Here, the entire frame of the input image is included in the rectangular region D04. When an encoding block D07 includes data inside and outside the object, outside the object can be set to an arbitrary value and thus encoded. When an encoding block D08 does not include object data, only a flag representing xe2x80x9cthere is no data to be encodedxe2x80x9d is encoded.
Referring to FIG. 25, when an image object, such as MPEG-4 Video, is placed in a scene, a placement position of the object in scene coordinates is designated. The placement position is described in a scene description. The placement position can be designated in two-dimensional coordinates or in three-dimensional coordinates. Alternatively, the placement position can be designated based on alignment constraints, such as xe2x80x9cplacing an object at the lower left of the screenxe2x80x9d. In FIG. 25, the center of a rectangular region containing the object is used as a positional reference of the object. Alternatively, the centroid of the object or the upper left of the object can be used as the positional reference. Hence, the object is placed according to the reference position of the object.
When an object in an input moving image or graphic data is deformed, the object placed based on the scene description is shifted in the scene. In frame 1 in FIG. 25, an object segmented from an input image is placed based on a scene description designating the center of a rectangular region containing the object to be placed at a placement position a. In frame 2, the object is deformed, and the rectangular region containing the object is also deformed. Hence, the object, which does not move in the original input image or graphic data, is undesirably shifted in the described scene. It is thus desired that a part which does not move in the original input image or graphic data is not shifted in the described scene. When an object moves in the input image or graphic data, the conventional art is not capable of reflecting the object movement and of placing the object in the scene described by the scene description. Specifically, the conventional art fails to change the placement position of the object to a desired placement position b in the described scene.
In a scene description, an image or graphic data is not always regarded as an object. Sometimes such an image or graphic data is employed as a texture to be pasted on a surface of another object in a scene. FIG. 26 shows an example of pasting an image object on a surface of a cube. In ISO14496-1 standard MPEG-4 scene description, an image employed as a texture is regarded to be in a range from 0 to 1 in an s-t coordinate system, that is, a two-dimensional texture coordinate system. This is referred to as a texture map. When a texture is pasted on a surface of an object, the part of the texture map to be used is designated by texture coordinates. When a texture is to be pasted on a cube or on a rectangular prism, as in FIG. 26, a region corresponding to 0 to 1 in both s-t directions of the texture map is pasted on each separate side of the cube or the rectangular prism. When a segmented object is employed, and the object is deformed, as in frame 2 in FIG. 26, a region containing the object is also deformed. Hence, a picture frame of the texture image is deformed. Despite this deformed picture frame, the entire picture frame of the texture map ranging from 0 to 1 is employed. Thus the pasted texture is deformed, whereas the original object in the input image is transformed in a different manner. It is thus desired to display such an object in a described scene in the same manner as in the original input image or graphic data.
When an object obtained by segmenting a static image signal, a moving image signal, or graphic data is placed in a screen, and a new scene is described, the following problems occur due to deformation of the object in the image or the graphic data.
First, when the object is deformed, and a region containing the object is also deformed, the object is undesirably shifted in a scene described by a scene description. In addition, movement of the object in the input image or in the graphic data is not reflected in movement of the object in the scene.
Second, when a segmented image or graphic data is employed as a texture in a scene description, and when the object is deformed and a region containing the object is also deformed, the texture to be pasted is distorted in a scene described by the scene description. In addition, movement of the object in the input image or the graphic data is not reflected in movement of the texture.
Accordingly, it is an object of the present invention to provide a scene description generating apparatus and method and an object extracting method for solving the above problems, that is, for preventing generation of undesirable shifting or distortion in a scene described by a scene description even when an object in an input image or graphic data is deformed, and for reflecting movement of the object in the input image or the graphic data in movement of the object or in movement of the texture in the scene.
According to an aspect of the present invention, the foregoing objects are achieved through provision of a scene description generating apparatus and method including an object extracting step of extracting an object from an input image and outputting positional information on the extracted object. Based on the positional information output in the object extracting step, scene description information about a placement position of the object in a scene is generated in a scene description generating step. When the object is deformed, the positional information is referred to in the scene description generating step and the scene description information in which the object deformation is reflected is generated.
According to another aspect of the present invention, the foregoing objects are achieved through provision of a scene description generating apparatus and method including an object extracting step of extracting an object from an input image. In a positional information detecting step, positional information on the object extracted in the object extracting step is detected. Based on the positional information detected in the positional information detecting step, scene description information about a placement position of the object in a scene is generated in a scene description generating step. When the object is deformed, the positional information is referred in the scene description generating step and the scene description information in which the object deformation is reflected is generated.
According to another aspect of the present invention, the foregoing objects are achieved through provision of a recording medium for causing a scene description generating apparatus for generating scene description information on an object to execute a computer-readable program. The program includes an object extracting step of extracting the object from an input image and outputting positional information on the extracted object. Based on the positional information output in the object extracting step, the scene description information about a placement position of the object in a scene is generated in a scene description generating step. When the object is deformed, the positional information is referred to in the scene description generating step and the scene description information in which the object deformation is reflected is generated.
According to the present invention, when placing an object segmented from a static image signal, a moving image signal, or graphic data by an object extracting unit/step in a screen and describing a new scene, the object extracting unit, i.e., a segmentation unit, outputs positional information on a region containing the object in the input image or the graphic data. Based on the output positional information, a scene description generating unit/step determines a placement position of the object. Accordingly, even when the region containing the object is deformed or shifted, the object is placed at a desirable position in the scene described by the scene description. When the segmented object is used as a texture in the scene description, the scene description is generated in which texture coordinates are transformed based on the positional information output from the segmentation unit. Therefore, distortion of a texture pasted in the scene is prevented, and shifting of the object is reflected in the texture. Alternatively, texture distortion is prevented by changing the size of a scene object on which the texture is to be pasted or by changing the position of the texture.
When the positional information on the region containing the object in the image or the graphic data is included in data of the segmented object, the positional information is made equally available by means of a positional information detector to which the object data is input to detect the positional information. Hence, undesirable shifting or distortion in the scene is prevented.
When the region is determined so as to contain objects in frames of a plurality of images or graphic data and is segmented, the number of changes of the placement position is reduced, or changes are not necessary at all. In particular, when the region containing the object is set as a picture frame of the input image or the graphic data, it is not necessary to change the placement position.