1. Field of the Invention
The present invention relates to an information processing techniques useful in an information processing apparatus configured to process image information at an arbitrary viewpoint in a real (physical) space or a virtual space.
2. Description of the Related Art
Hitherto, it is known that controlling a stimulus given to the human sense of sight, for example, can cause a person to have perception in a state differing from the inherent state.
With the techniques represented by computer graphics (hereinafter abbreviated to “CG”), etc., a virtual object drawn on a two-dimensional plane is displayed as if it exists three-dimensionally, in consideration of human spatial perception characteristics. In the case of CG, perception (e.g., perception representing a three-dimensional object) differing from that corresponding to the inherent state (e.g., a two-dimensionally expressed object) is generated by controlling three-dimensional cues (depth cues), such as shadows, overlaps and texture gradients of virtual objects.
In addition to the CG, stereoscopic display techniques using a random dot stereogram and binocular stereopsis, etc. are also known as similar techniques.
One example of systems utilizing those techniques is a mixed reality system (hereinafter referred to as an “MR system”). The term “MR system” implies a system of generating a virtual space image corresponding to, e.g., the position and the direction of a viewpoint, which are provided based on a real space image, by using the above-mentioned techniques, and by mixing (combining) both the images such that a mixed image of the real space image the virtual space image is displayed to a user.
The MR system enables the user to perceive as if a virtual object actually exists in a real space. Because the virtual space image and the real space image are mixed with each other in the MR system, the MR system is more advantageous than the known virtual reality system (VR system) in that the user can observe a virtual object with an actual size feel.
In the system of displaying the mixed image of the real space image and the virtual space image as in the MR system, a difference occurs in depth perception even when the virtual space image is generated so as to look at an object from a viewpoint in the same position and direction as those of the viewpoint for real space image.
Stated another way, even if the virtual space image is generated based on a virtual object which is prepared such that the viewpoints of the real space and the virtual space have the same position and direction, a difference occurs in depth perception when the generated virtual space image is mixed with the real space image. There are two major reasons why such a phenomenon is caused.
(1) Difference in Depth Cues
The first cause is that different depth cues are displayed for objects having the same depth.
In the MR system, generally, because the virtual space image is generated with rendering of the virtual object based on the viewpoint for the real space image, high matching can be maintained in shape between the virtual object and the real object. However, matching between the virtual object and the real object cannot be maintained in saturation and definition. As a result, a difference occurs in the depth perception. That point will be described in more detail below.
(a) Difference in Depth Perception Caused by Mismatching in Saturation between Real Space Image and Virtual Space Image
Usually, a real space image taken by a video camera and a virtual space image generated by the CG differ in saturation because image generating devices differ from each other. Human depth perception characteristics have such a tendency as causing a person to perceive an object with higher saturation as being positioned nearer, and to perceive an object with lower saturation as being positioned farther. In general, the real space image taken by a video camera, for example, has lower saturation than the virtual space image generated by the CG. Therefore, when the virtual object and the real object are positioned at the same distance, the virtual object is perceived as being positioned nearer.
(b) Difference in Depth Perception Caused by Mismatching in Definition between Real Space Image and Virtual Space Image
Generally, human depth perception characteristics have such a tendency as causing a person to perceive an object with sharper edges as being positioned nearer, and to perceive an object with a larger blur as being positioned farther. Comparing the virtual space image and the real space image, the virtual space image has sharper edges. Therefore, when the virtual object and the real object are positioned at the same distance, the virtual object is perceived as being positioned nearer.
While the above description is made in connection with the MR system, for example, such a difference in depth perception similarly occurs between an ordinary virtual space image generating system (hereinafter referred to as a “CG system”) and a real space image generating system, e.g., a video camera, in addition to the MR system.
(2) Lack of Depth Cues
The second cause is lack of depth cues. The following description is made in connection with, as a concrete example, a system causing a user to experience a three-dimensional image in the MR system or the CG system by using an HMD (Head Mounted Display) or stereoscopic spectacles. In such a system, because of lack of depth cues required for a person to perceive a depth, a difference occurs in depth perception between the real space image and the virtual space image. Herein, the term “depth cues” implies convergence information, focusing (focus adjustment) information, etc.
(a) Difference in Depth Perception Caused by Lack of Convergence Information
One of the depth cues when a person perceives an object is convergence information of an eyeball. In general, when a person sees a near object, outer eye muscles are caused to act for adjustment such that eyeballs are directed inwards and a target to see is positioned at respective centers of the retinas of both the eyeballs. The amounts of actions of the outer eye muscles at that time are taken as the depth cues.
Meanwhile, it is difficult for an image pickup apparatus, e.g., a video camera, to change the position and the direction of the apparatus based on information regarding visual actions of a photographer. Therefore, image information including the convergence information cannot be generated. In other words, the image information generated by the image pickup apparatus lacks the above-mentioned depth cue, i.e., the convergence information, and hence a difference occurs in depth perception between a sensed image and a taken image.
(b) Difference in Depth Perception Caused by Lack of Focusing Information
Another one of the depth cues when a person perceives an object is focusing information. In general, when a person observes an object, ciliary bodies are caused to act to control respective thicknesses of the eye lenses depending on the distance from the eyes to the object for the purpose of focusing. The amounts of actions of the ciliary bodies at that time are taken as the depth cues.
However, when a person observes an object through a display surface of a display apparatus, the focus is always held on the display surface. In other words, the object displayed on the display surface lacks the above-mentioned depth cue, i.e., the focusing information, and hence a difference occurs in depth perception between a sensed image and a taken image.
One conceivable method for eliminating the above-described difference in depth perception is to establish matching between the virtual space image and the real space image in consideration of the depth cues.
Such a method, however, requires steps of analyzing the real space image, obtaining an illumination condition in the real space, human focusing information, etc., and correcting the shape, the saturation and the definition.
Further, the method requires a mechanism for enabling a convergence angle of the display apparatus to be changed, and for changing the convergence angle of the display apparatus based on a value obtained by measuring the state of the human eye balls. Alternatively, the method requires a device for measuring the amounts of actions of the human ciliary bodies, and a mechanism for adjusting a lens based on the measured information. Otherwise, the method requires a mechanism for enabling the position of the display apparatus to be changed, and for controlling the position of the display apparatus based on the amounts of actions of the human ciliary bodies.
However, those constructions inevitably enlarge the system size and increase the cost. Another problem is that, because of a longer processing time, those constructions cannot be applied to a system which requires real time processing. Still another problem is as follows. A difference in depth perception when an object is viewed from a particular position and direction is eliminated by establishing the matching between the real space image and the virtual space image. However, the matching in configuration (shape and size) is lost when the object is viewed from a different position and direction.
As will be understood from the above discussion, there is a demand for a simple method capable of eliminating a difference in depth perception which occurs between image information obtained from an arbitrary viewpoint in a real space and image information obtained from the corresponding arbitrary viewpoint in a virtual space, e.g., a difference in depth perception caused by a difference and/or lack of the depth cues.