1. Field of the Invention
The present invention relates to a technique for presenting a mixed reality space obtained by compositing a physical space and virtual space.
2. Description of the Related Art
Conventionally, a mixed reality (MR) presentation system is available. This system superposes and displays a video image of a physical world and that based on a three-dimensional (3D)-modeled CG, and presents as if an object (virtual object) rendered by a CG were existing on the physical world.
This system comprises a physical video capturing unit which captures a video image of the physical world, a CG video generation unit which generates a CG video image viewed from the position and orientation of the physical video capturing unit, and a video display unit which composites and displays these video images. Furthermore, this system comprises a viewpoint position and orientation detection unit (for example, a position and orientation sensor) which detects the position and orientation of the viewpoint of the physical video capturing unit, so as to correctly display the positional relationship between the CG video image and the video image of the physical world even when the position and orientation of the viewpoint of the physical video capturing unit have changed.
The physical video capturing unit, which captures a video image of a physical space, comprises, e.g., a video camera. The physical video capturing unit captures a video image of the physical world in the line of sight direction of itself, and stores the captured video image of the physical world in a memory.
The CG video generation unit lays out a 3D-modeled CG on a virtual space having the same scale as the physical world, and renders a virtual scene as that observed from the position and orientation of the viewpoint detected by the viewpoint position and orientation detection unit.
Upon compositing the CG video image generated in this way and the video image of the physical world captured by the physical video capturing unit, a video image can be consequently displayed so that an observer can observe a CG image laid out on the physical world independently of the position and orientation of the viewpoint. Loading of CG data, a change of a CG layout, animation, and the like can be implemented using the same method as a conventional CG display system.
In order to designate the CG layout position, an additional position and orientation sensor is used, and a CG image can be rendered at a position and orientation indicated by the measurement values of the position and orientation sensor. With this arrangement, the user holds the position and orientation sensor by the hand, and can observe a CG image displayed at the position and orientation designated by the position and orientation sensor, as is conventionally done.
As a video display device which composites and displays a video image of the physical world and a CG video image, for example, an HMD (Head Mounted Display) is used. By using the HMD in place of a normal monitor, and mounting the video camera in the line of sight direction of the HMD, a video image in a direction in which an observer faces can be displayed on the HMD, and a CG video image when the observer faces that direction can be rendered, thus enhancing observer's sense of immersion.
A video display device in an MR presentation system displays, on the aforementioned video display device, an image (MR image) obtained by compositing a video image of the physical world and a CG video image. Note that the video display device may be an HMD called an optical see-through type, which allows an observer to see through a state in front of him or her. In this case, the aforementioned physical video capturing unit optically displays a scenery in front of the HMD intact on a display device without capturing a video image. With the HMD of this type, the observer can directly see through a scenery in front of him or her without any digital processing, and a CG image can be further displayed on that screen.
As the viewpoint position and orientation detection unit, for example, a magnetic position and orientation sensor or the like is used. By attaching such position and orientation sensor to the video camera as the physical video capturing unit (or the HMD to which the video camera is attached), the position and orientation of the viewpoint can be detected. The magnetic position and orientation sensor is a device which detects a relative position and orientation between a magnetism generation device (transmitter) and magnetic sensor (receiver), and detects a 3D position (X, Y, Z) and orientation (Roll, Pitch, Yaw) of the receiver in real time.
With the above arrangement, the observer can observe an MR image obtained by compositing a video image of the physical world and a CG video image via the video display device such as the HMD or the like. When the observer looks around, the physical video capturing unit (video camera) equipped on the HMD captures a video image of the physical world, and the viewpoint position and orientation detection unit (position and orientation sensor) equipped on the HMD detects the position and line of sight direction of the video camera. Then, the CG video generation unit renders a CG video image viewed from the video camera, and composites that image on the video image of the physical world, thereby displaying a composite image.
In the MR presentation system, a CG image can be superposed on a physical object. For example, in a game disclosed in patent reference 1 indicated below, by superposing and displaying a 3D CG image such as a sword, weapon, or the like on an interactive operation input device held by the user, the user can freely manipulate a virtual object (the sword or weapon in this case).
In a conventional, general MR presentation method, a CG video image is merely composited on a video image of the physical world by superposition, and the depth ordering between a physically existing object and CG image is often not considered. For this reason, even when the observer puts his or her hand in front of a CG image, he or she cannot see his or her hand, and the CG image behind the hand is displayed as if that CG image were located in front of the hand.
FIG. 2A is a view showing an observer who wears an HMD on the head, and a virtual object observed by this observer. In FIG. 2A, an observer 200 wears an HMD 201 on the head, and observes a virtual object 202 while bringing his or her hand 203 into self view.
FIG. 2B is a view showing an example of an image displayed on the HMD 201 when the observer 200 observes the virtual object 202 while bringing his or her hand 203 into self view. As shown in FIG. 2B, reference numeral 204 denotes an image displayed on the HMD 201. This image 204 includes that of the hand 203, which is occluded behind the virtual object 202. In FIG. 2B, the occluded hand 203 is indicated by the dotted line.
According to the depth ordering between the virtual object 202 and hand 203, the hand 203 is to be rendered in front of the virtual object 202. However, since a CG video image is composited on a video image of the physical space by superposition, the virtual object 202 is rendered on a region where the hand 203 is to be originally rendered. A physical video image of a wrist part which is not occluded behind the virtual object 202 can be seen, but a finger tip part occluded behind the virtual object 202 cannot be observed.
If depth information of a physical object is measured in real time, the depth ordering between virtual and physical objects can be correctly displayed. However, an apparatus required to measure the depth information of a physical object in real time is bulky and expensive. Furthermore, the contour of overlapping between virtual and physical objects is often not correctly seen due to an insufficient resolution of the depth information.
When the color of a physical object is expected to be a specific color, the following method is often used. That is, a mask image is generated by detecting the specific color on an image, and a CG video image is masked using that mask image, so that no CG video image is rendered on a place where the physical object is to be displayed. For example, when overlapping of a hand poses a problem, a mask image can be generated by detecting a flesh color area in a video image of the physical world. However, in this case, even when a physical object exists behind a virtual object, the physical object may be unwantedly observed in front of the virtual object or all physical objects with the same color may be unwantedly observed in front of the virtual object.
As one of methods of solving such problem of overlapping display between virtual and physical objects, the following method is available. That is, a position and orientation sensor is attached to a physical object (e.g., own hand), and a virtual object that simulates the shape of the physical object is laid out in correspondence with the position and orientation measured by that position and orientation sensor, so that the physical object and virtual object overlap each other. The depth ordering of the respective virtual objects is correctly displayed, since both objects are CG images. Since the shapes of the physical and virtual objects and their positional relationship do not perfectly match, the physical and virtual objects are displayed not to perfectly overlap each other (seen to deviate from each other), but the virtual object is displayed at basically the position of the physical object to have a correct depth ordering.
Using such arrangement, when the observer puts his or her hand 203 in front of the virtual object 202, as shown in FIG. 2A, an image, in which a virtual object 206 that simulates the hand 203 is laid out at the position of the hand 203, as shown in FIG. 2C, is displayed on the HMD 201 becomes an image. That virtual object 206 is located in front of the virtual object 202. The position and orientation of the virtual object 206 change based on the measurement values of the position and orientation sensor attached to the hand of the observer 200. FIG. 2C is a view showing an example of the image in which the virtual object 206 that simulates the hand 203 is laid out at the position of the hand 203.
[Patent Reference 1] Japanese Patent Laid-Open No. 2000-353248
The conventional system, in which a virtual object is overlaid on a physical object, suffers the following problem. That is, when a virtual object that simulates a physical object (the virtual object 206 in FIG. 2C) and another virtual object (the virtual object 202 in FIG. 2C) do not overlap each other when viewed from the observer, the deviations between the positions and shapes of the physical object and the virtual object that simulates the physical object stand out. As a result, the observer may feel unnatural in terms of their appearance.
FIG. 3 is a view showing an example of an image displayed on the HMD when the observer moves the hand 203 to the right from the state of the image shown in FIG. 2C. Due to the measurement errors of the position and orientation of the hand 203, the hand 203 of the observer and the virtual object 206 that simulates the hand 203 are displayed on the HMD to deviate from each other, as shown in FIG. 3. Reference numeral 207 denotes an image displayed on the HMD. In this image 207, the hand 203 of the observer and the virtual object 202 are displayed, and the virtual object 206 that simulates the hand 203 is displayed to deviate from the hand 203.
Due to a difference between the shapes of the physical hand 203 and the virtual object 206 of the hand, the physical hand 203 may be seen to run off the virtual object 206 of the hand, resulting in odd feeling.