Many schemes for describing a virtual space based not on a three-dimensional geometric shape but on a photo image have been proposed. Such schemes are called Image Based Rendering (to be abbreviated as IBR hereinafter), and can express a virtual space with high reality that cannot be obtained by a scheme based on a three-dimensional geometric shape.
Attempts to describe a virtual space on the basis of the ray space theory as one IBR scheme have been proposed. See, for example, “Implementation of Virtual Environment by Mixing CG model and Ray Space Data”, IEICE Journal D-11, Vol. J80-D-11 No. 11, pp. 3048–3057, November 1997, or “Mutual Conversion between Hologram and Ray Space Aiming at 3D Integrated Image Communication”, 3D Image Conference, and the like.
The ray space theory will be explained below.
As shown in FIG. 1, a coordinate system 0-X-Y-Z is defined in a real space. A light ray that passes through a reference plane P (Z=z) perpendicular to the Z-axis is defined by a position (x, y) where the light ray crosses P, and variables θ and φ that indicate the direction of the light ray. More specifically, a single light ray is uniquely defined by five variables (x, y, z, θ, φ). If a function that represents the light intensity of this light ray is defined as f, light ray group data in this space can be expressed by f(x, y, z, θ, φ). This five-dimensional space is called a “ray space”.
If the reference plane P is set at z=0, and disparity information of a light ray in the vertical direction, i.e., the degree of freedom in the φ direction is omitted, the degree of freedom of the light ray can be reduced to two dimensions. This x-θ two-dimensional space is a partial space of the ray space. As shown in FIG. 3, if u=tan θ, a light ray (FIG. 2) which passes through a point (X, Z) in the real space is mapped onto a line in the x-u space, which line is given by:X=x+uZ  (1)
Image sensing by a camera corresponds to registering in an imaging plane the rays that passes through the lens focal point of the camera, and the intensity and color of the ray is represented as an image. In other words, the set of light rays that passes through one point in the real space, i.e., the focal point position, corresponds to the set of captured pixels. In this, since the degree of freedom in the φ direction is omitted, and the behavior of a light ray is examined in only the X-Z plane, only pixels on a line segment that intersects a plane perpendicular to the Y-axis need be considered. In this manner, by sensing an image, light rays that pass through one point can be collected, and data on a single line segment in the x-u space can be captured by single image sensing.
When an image is sensed a large number of times by changing the viewpoint position, light ray groups which pass through a large number of points can be captured. When the real space is sensed using N cameras, as shown in FIG. 4, data on a line given by:x+Znu=Xn  (2)can be input in correspondence with a focal point position (Xn, Zn) of the n-th camera (n=1, 2, . . . , N), as shown in FIG. 5. In this way, when an image is sensed from a sufficiently large number of view points, the x-u space can be densely filled with data.
Conversely, an image observed from a new arbitrary viewpoint position can be generated (FIG. 7) from the data of the x-u space (FIG. 6). As shown in FIG. 7, an image observed from a new viewpoint position E(X, Z) indicated by an eye mark can be generated by reading out data on a line given by equation (1) from the x-u space.
In the mixed reality space that takes a photo image into a virtual space, real and virtual spaces are mixed. For this reason, image processes which are easy to implement in a real or virtual space alone may become hard to implement.
Image processes using photo image data do not excel in addition of shades and generation of a shadow by means of virtual illumination. This is because although shades or shadow change in accordance with the three-dimensional pattern of an object, it is hard to reconstruct shades or shadow since photo image data does not have any information pertaining to the geometric shape of the object. That is, a technique for rendering a virtual object on the basis of space data including geometric shape information, rendering shades to be added to that object or rendering a shadow formed by the object is known to those skilled in an image processing field based on geometric shape information (e.g., computer graphics (to be abbreviated as CG hereinafter)), but is unknown in an image processing field using a photo image such as a ray space or the like.
One difficulty in generation of a mixed reality space involves changing a real illumination condition and mixing a virtual image with a real space in real time in correspondence with the change in illumination condition.
Conventionally, the brightness of a real space is measured by a batch method, and the detected illumination condition is reflected in the mixed reality space.