1. Field of the Invention
This invention relates to a method, apparatus and program for compositing images, particularly a computer-graphic image and a picture taken by a camera, and a method, apparatus and program for rendering a three-dimensional model created by computer graphics into a two-dimensional image to be superposed on a picture taken by a camera to form a composite image.
2. Discussion of Background Art
Two-dimensional representation (for on-screen presentation or the like) of a three-dimensional object modeled utilizing a computer (hereinafter referred to as “3D model”) is created by a “rendering” process. Among conventional methods for rendering a 3D model (i.e., generating a two-dimensional image therefrom) is ray tracing, which is disclosed for example in Japanese Laid-Open Patent Application, Publication No. 2000-207576 A. The ray tracing is, as shown in FIG. 5, a method in which a 3D model 101 created in a virtual space on a computer is converted into a two-dimensional image assuming that the object represented by the 3D model 101 is viewed from a specific viewpoint 103. To be more specific, a plane of projection 102 is defined in a specific position of the virtual space on a side of the viewpoint 103 facing toward a direction in which the 3D model 101 can be seen from the viewpoint 103, for example, between the viewpoint 103 and the 3D model 101; in addition, a light source 104 is set at an appropriate place in the virtual space. In the plane of projection 102 are defined pixels 105 such as those arranged on a screen; a separate light ray 106 for each pixel 105, which is transmitted from the pixel 105 to the viewpoint 103, is traced backward from the viewpoint 103 through the pixel 105 to its origin (3D model 101), or through the 3D model 101 to the pixel 105, so that a color (attributes thereof) of a corresponding portion of the 3D model is assigned to the color of the pixel 105. This operation is performed for every pixel 105, to eventually project the 3D model 101 two-dimensionally on the plane of projection 102.
Performance improvements of computers in recent years have enabled operation of compositing a picture (typically as a digitized image) taken from life by a camera with an image formed using computer graphics or CG such as characters, packaged goods, etc., and have thus encouraged new visual expression particularly in making movies and TV programs.
In order to create a composite image in a manner as described above, a 3D model is generated in the virtual space on a computer at the outset. Next, the 3D model is rendered by ray tracing to generate a two-dimensional image. In the process of ray tracing, the view point, plane of projection and pixels thereon are defined in such a manner as to simulate picture-taking conditions of a real-world camera, such as a shooting angle (tilt angle, etc.) and angle of view, in which the camera has taken a picture for use in forming a composite image. The two-dimensional image formed on the plane of projection through the process of ray tracing is superposed on the picture (film or digital image) taken by the camera, thereby forming the composite image.
The above-described method for compositing images is however based upon the premise that the camera used to take the picture would have operated on the principle of a pinhole camera (i.e., according to “pinhole camera model”). Therefore, a minute amount of displacement occurs when the image (picture) taken by the camera and the CG image rendered in accordance with the pinhole camera model are superposed.
The difference between an image according to the pinhole camera model having no lens and an image (picture) taken by a real-world camera having lens systems will be described hereinbelow.
According to the pinhole camera model, as shown in FIG. 6, rays of light traveling through a base position (pinhole H) alone strike on an image plane, so that visible aspects of a three-dimensional space are mapped into a two-dimensional space on the image plane. In other words, the pinhole camera model is premised on one imaginary pinhole through which rays of incident light travel and strike on the image plane to form an image thereon. In contrast, the real-world camera having lens systems, unlike the pinhole camera, is not adapted to produce rays of incident light to one point of convergence. Thus, an image taken by the camera having lens systems contains nonlinear distortion, which is greater in peripheral areas.
On the other hand, the process of tracing each ray of light backward from one fixed viewpoint upon rendering a 3D model utilizing ray tracing is analogous with the process of taking the picture of a 3D object using a pinhole camera. Accordingly, rays of light, as computed by ray tracing, each strike on the image plane in a direction subtly different from that in which the corresponding ray of light incident on the image plane in the real-world camera would travel.
Consequently, according to the above conventional method of compositing images, a CG image created as described above appears slightly displaced relative to an image of the picture taken from life by the camera. Such displacements, if brought into a still image or frozen frame, possibly could not appear so obtrusive as to annoy a viewing person, but if brought into each image (frame) of a moving video picture, would slightly shake the CG image, producing an unnatural impression.
Assume, for example, that a scene from the driver's seat of an automobile is shot by a camera so that the camera takes pictures of the instrument panel and views seen through the windshield. The pictures taken by the camera are then combined with a CG image of an array of various gauges and accessories to be arranged on the instrument panel. In a case where the camera pans to record a scene, the CG image of gauges, etc. would disadvantageously shake relative to the instrument panel during the scene in a sequence of the resultant composite images made by the aforementioned conventional method, though the CG image should move together with the instrument panel, without changing the relative positions thereof.
This phenomenon becomes nonnegligible when the object distance in the pictures taken by the camera varies broadly from a long range to a close range and the distance of the 3D object from the viewpoint for creating CG images is small. Against this backdrop, the conventional method of compositing images as described above requires an extra manual operation of correcting the position of the CG image relative to the pictures on which the CG image is superposed.
Accordingly, there is an increasing demand to provide a method, apparatus and program for compositing images, and a method, apparatus and program for rendering a three-dimensional model, in which errors derived from the pinhole camera model utilized in rendering a 3D object to be combined with a picture taken from life by a camera can be removed to obtain a natural composite image.