1. Field of the Invention
The present invention relates to a method and apparatus for synthesizing computer-graphics images and actual scene images.
2. Description of Related Art
In recent years, synthesizing images of computer graphics (hereinafter abbreviated CG) images and actual scenes images has become a popular technique in the production of motion pictures and commercial films. For example, by synthesizing a prerecorded image of a human and an image of a virtual building created by a CG technique, it is possible to produce an effect that gives the visual illusion as if the human were walking in the building. The image produced by synthesizing a CG image and an actual scene image is realistic and gives a great visual impact to a viewer, and such a technique is indispensable, particularly for scene simulation.
Generally, in a CG technique, the shape of an object to be drawn is defined by using a simple shape (shape primitives) such as a plane surface, quadratic curved surface, and so on, and processings of applying desired color and pasting image data on the surface are performed. However, when a natural object such as a tree or a river is drawn by using this method, the resulting image will look static at a fixed position. Therefore, to produce animation that looks more natural, a moving scene of a tree swaying in the wind or of a river with flowing water is shot in advance, and after that, the thus captured image is synthesized with a scene created by the CG technique. Previously, it has been practiced to produce a synthetic image of an actual scene image and a CG image CG by selecting an image from taken actual scene images and pasting it to a simple shape primitive such as a flat plate. This processing of image synthesis is repeated for each frame, to produce animation from a sequence of successive synthetic images.
As a publicly known literature concerning static image synthesis there is “A Technique for 2.5-Dimensional Simple Scenic Model Construction for Scene Simulation,” Symposium on Image Recognition and Understanding (MIRU '92), July 1992.
According to this technique, an actual scene is shot by assuming the completed synthetic image first, and then, a portion to be synthesized is clipped from the image of the actual scene and superimposed on a CG-generated scene to produce a synthetic image of CG and actual scenes. Shooting by this technique, however, requires large-scale settings, such as camera shooting in a studio using a blue background and the measurement of the camera position for shooting.
A computer-aided method also is proposed in the literature of MIRU '92. According to the proposed method, viewpoint information is extracted from an image of an actual scene, and the object in the image is approximated by a two-dimensional model which is then synthesized with a CG image. However, since this is not a model having perfect three-dimensional information, there exist limitations on the image synthesizing process, such as the inability to change the viewpoint when synthesizing the images.
In Japanese Patent Application Laid-Open No. 3-138784(1991), there is proposed a technique in which, in order to treat an object in a static image as the one in three dimensions, the object in the static image is reconstructed on the basis of a three-dimensional model, and the image portion corresponding to the three-dimensional object is mapped as a surface texture of the three-dimensional object model to be displayed. This technique also proposes synthesizing a surface texture from a plurality of input image frames for one three-dimensional portion. However, in case of a video image, there can arise a situation where the surface texture changes moment by moment, and when a plurality of textures are synthesized, the resulting texture may become smoothed in the direction of the time series, the fact being unsuitable for the purpose.
There are no methods ever established that can perfectly extract the shape of a three-dimensional object in an image. Methods having been proposed in the prior art include one in which the reflection characteristics of an object surface are assumed and inclinations of the object surfaces are obtained from the observed color values, and one in which models of objects observed in an image are prestored and how an object observed in an image looks is checked with the prestored models (Japanese Patent Application Laid-Open Nos. 62-162173 (1987) and 3-244005 (1991)). These methods have been developed along with the development of image understanding research.
However, either method cannot be applied unless the application condition is satisfied. For example, the former method requires a prior assumption of the reflection characteristics of the object, and the latter requires that the models be prestored for the object to be observed.
Electrical image synthesis is performed as shown in FIG. 1. For example, an image of a human on a blue background is captured by an image input section A, and an image of a landscape is captured by an image input section B. Then, the blue component of the image in the image input section A is detected, and is fed to an inverting amplifier for inverting amplification and appropriate control of the mixing ratio. The signal inverting-amplified signal is then fed into a mixing amplifier where it is mixed with the signals from the image input sections A and B, the output of the mixing amplifier then being fed to an image output section. As a result of this processing, the background in the image from the image input section A vanishes, and an image of the human with the image from the image input section B as the background is produced.
The above conventional method requires an extra facility for the provision of a blue background. Furthermore, the above method lacks versatility since it can be used only for those originally intended for image synthesis. Moreover, the setting of parameters for mixing in the mixing amplifier is difficult, and the operation is also intricate.
Next, the prior art concerned with the presentation of three-dimensional shape models will be described.
As the operating speeds of computers increase, it has become possible to display a three-dimensional shape model by rotating, scaling, or translating it in real time, and there has been a demand for a function that enables a human to operate a three-dimensional shape model interactively and that can re-display the result of the operation. This demands the establishment of an operation method that can rotate, scale, and translate the three-dimensional shape model without interrupting the human's thinking process.
For transformations of a three-dimensional shape model in three-dimensional space, a total of six degrees of freedom is required, i.e., three degrees of freedom for rotation and three degrees of freedom for translation. When displaying a three-dimensional shape model on a two-dimensional display screen, of the above degrees of freedom, the movement in the depth direction with respect to the display can be presented by scaling. Therefore, in this case, transformations can be achieved by a total of six degrees of freedom: three degrees of rotation freedom, one degree of scaling freedom, and two degrees of translation freedom. In a three-dimensional model operation method of the prior art, the above operations were assigned to 12 keys on a keyboard, each for one of the six degrees of freedom in both positive and negative. In a three-dimensional model operation method using a pointing device such as a mouse, mode switching was performed to enable the pointing device having only two degrees of freedom to handle transformations of six degrees of freedom. In a method intermediate between the above two, two degrees of freedom are operated by a pointing device, while the other four degrees of freedom are operated by a keyboard.
In the operation method by a keyboard, since two keys, one for the positive direction and the other for the negative direction, are assigned for each axis, a transformation is possible along only axis direction. For example, when horizontal and vertical axes are provided for translation in a plane, a translation in an oblique direction requires a two-step operation, first moving vertically and then moving horizontally (or first moving horizontally and then moving vertically). Furthermore, in case of rotation, the problem becomes more serious since with this method it is extremely difficult to decompose the intended transformation into vectors of axial directions.
In the operation method by a pointing device, transformations in directions oblique to axes can be performed, but there are problems that mode switching requires an intricate operation, and that the three degrees of rotation freedom cannot be operated satisfactorily.
In the operation method by a keyboard in combination with a pointing device, inputs by using two different devices only add to a difficulty in operation, and it cannot be said that this method compensates the shortcomings of the above two methods.
There is a further problem that in displaying, it is not grasped about which point or axis the model will be rotated as a center until it is actually rotated.
When an image of an actual scene recorded by a video tape recorder (VTR) is to be synthesized with a CG image, since the number of frames in the VTR image is fixed, CG drawing needs to be performed in synchronization with the VTR image frames. That is, CG drawing has to be synchronized frame by frame, by manual operation, with the reproduction of the VTR image.
This requires an enormous number of processes to produce an image sequence consisting of a large number of frames.
From the above description, subjects in the techniques for synthesizing CG images and actual scene images are summarized as follows.    (1) To enable to transform an image of an actual scene into a three-dimensional shape CG model in a simple process.    (2) To permit the intervention of an operator for the transformation.    (3) To enable a video image of an actual scene to be synthesized with a CG image.    (4) To enhance ease of operation and operation efficiency when extracting a desired portion from an image of an actual scene.    (5) To enhance ease of operation when applying rotation, scaling, and translation transformations of a CG model.    (6) To achieve easy synchronization between CG and actual scene images.