1. Field of the Invention
The present invention relates to a technique of providing mixed reality.
2. Description of the Related Art
There is an MR (Mixed Reality) presentation apparatus which forms the image of an object (virtual object) by three-dimensional modeling, and superimposes the virtual object image on the image of physical space as if the CG object were present in the physical space (non-patent reference 1).
This apparatus includes the following units.                A physical image sensing unit (e.g., video camera) which senses the image of the physical space        A CG image generation unit which generates a CG image viewed from the physical space image sensing position        An image display unit (e.g., HMD (Head Mounted Display) or monitor) which composites the physical space image with the CG image and displays the composite image        
The apparatus also includes a line-of-sight position and orientation detection unit (e.g., position and orientation sensor) which detects the line-of-sight position and direction of the physical image sensing unit to accurately display the positional relationship between the CG image and the physical space image even when the line-of-sight position and orientation of the physical image sensing unit has changed.
The CG image generation unit places the virtual object formed by three-dimensional modeling in virtual space having the same scale as the physical space, and renders the virtual space observed from the line-of-sight position and direction detected by the line-of-sight position and orientation detection unit. The thus generated CG image is composited with the physical space image sensed by the physical image sensing unit. It is consequently possible to display an image as if the virtual object existed in the physical space independently of the line-of-sight position and direction.
Changing the type or layout of the virtual object or its animation can freely be done by the same method as general CG. The position of the virtual object may be designated using an additional position and orientation sensor so that the virtual object is arranged at a position and orientation corresponding to the measured value of the position and orientation sensor.
The conventional arrangement also allows the user to hold the position and orientation sensor in hand and observe the virtual object arranged at a position and orientation indicated by the measured value of the position and orientation sensor.
The physical image sensing unit that senses the physical space image is, for example, a video camera which senses an image in its line-of-sight direction and captures the image in a memory.
As an image display device which composites the physical space image with the CG image and displays the composite image, for example, an HMD is used. When the HMD is used in place of a normal monitor, and the video camera is attached to the HMD while being directed in its line-of-sight direction, an image in the observer's looking direction can be displayed on the HMD. Since a CG corresponding to the observer's looking direction can be rendered, the observer can experience a world closer to the reality.
The image display unit of the mixed reality presentation apparatus displays, on the image display device, an image (MR image) obtained by compositing the physical space image with the CG image.
As the line-of-sight position and orientation detection unit, a magnetic position and orientation sensor or the like is used. The position and orientation sensor is attached to the video camera (or the HMD with the video camera), thereby detecting the position and orientation of the video camera. The magnetic position and orientation sensor detects the relative position and orientation between a magnetic field generator (transmitter) and a magnetic sensor (receiver). It detects the three-dimensional position (X, Y, Z) and orientation (Roll, Pitch, Yaw) of the sensor in real time.
The above-described arrangement enables the observer to observe the composite image of the physical space image and the CG image via the image display unit such as the HMD. If the observer looks around, the video camera attached to the HMD senses the physical space image, and the position and orientation sensor attached to the HMD detects the line-of-sight position and direction of the video camera. Accordingly, the CG image generation unit generates (renders) a CG image viewed from the line-of-sight position and orientation, composites it with the physical space image, and displays the composite image.
The mixed reality presentation apparatus can superimpose a virtual object on a physical object. In, for example, a game disclosed in patent reference 1, a virtual object of a sword or weapon is superimposed on an interactive operation input device held by a user, thereby allowing him/her to freely manipulate the virtual object (in this case, the sword or weapon). In non-patent reference 2, a virtual object generated by CAD is superimposed on a mock-up 1310 of a camera as shown in FIG. 5, thereby implementing a virtual scale model that can actually be taken in hand.
The conventional mixed reality presentation method only superimposes and composites a CG image on the physical space image. The depth relationship between the physical object and the virtual object is not necessarily taken into consideration. For this reason, when the observer puts a hand of his/her own in front of a virtual object, the hand is invisible, and the virtual object that should be behind the hand is displayed on the near side.
FIG. 2A is a view showing an observer who wears an HMD on the head, and a virtual object observed by the observer. Referring to FIG. 2A, an observer 200 wears an HMD 201 on his/her head and observes a virtual object 202 while putting his/her hand 203 in the field of vision.
FIG. 2B is a view showing an example of an image displayed on the HMD 201 when the observer 200 observes the virtual object 202 while putting the hand 203 in the field of vision. As shown in FIG. 2B, an image 204 is displayed on the HMD 201. The image 204 includes the hand 203. The virtual object 202 hides the hand 203. In FIG. 2B, the hidden hand 203 is indicated by a dotted line.
The hand 203 should be rendered in front of the virtual object 202 in consideration of the depth relationship between the virtual object 202 and the hand 203. However, since the CG image is superimposed on the physical space image, the virtual object 202 is rendered in the region where the hand 203 should be rendered.
The depth relationship between the virtual object and the physical object can correctly be displayed by measuring the depth information of the physical object in real time. However, a device to be used to measure the depth information of a physical object in real time is bulky and expensive. In addition, if the resolution of depth information is insufficient, the outline of overlap between the virtual object and the physical object may be inaccurate.
If a physical object is expected to have a specific color, a mask image is generated by determining the specific color on the image. A CG image is masked with the mask image not to render the CG image in the region where the physical object should be displayed. For example, if overlap of a hand poses a problem, a mask image can be generated by determining a flesh color region in the physical space image (FIG. 9 of non-patent reference 3). In this case, however, a physical object that should be placed behind a virtual object is displayed on the near side. In addition, all physical objects of the same color are displayed in front of a virtual object.
The problem of overlap of a virtual object and a physical object can be solved by the following method. A position and orientation sensor is attached to a physical object (e.g., observer's hand). A virtual object that simulates the shape of the physical object is arranged in accordance with a position and orientation measured by the position and orientation sensor and superimposed on the physical object. Both the objects are CG images and are therefore displayed in a correct depth relationship.
When the hand 203 of the observer is arranged in front of the virtual object 202, as shown in FIG. 2A, using the above-described arrangement, a virtual object 206 that simulates the hand 203 is arranged at the position of the hand 203 in the image displayed on the HMD 201, as shown in FIG. 2C. The virtual object 206 is located in front of the virtual object 202. The position and orientation of the virtual object 206 changes based on the measured value of the position and orientation sensor attached to the hand of the observer 200. FIG. 2C is a view showing an example of the image in which the virtual object 206 that simulates the hand 203 is arranged at the position of the hand 203.    [Non-patent reference 1] Hiroyuki Yamamoto, “Mixed Reality: A New World Seen at the Boarder between Real and Virtual Worlds”, information processing, vol. 43, no. 3, pp. 213-216, 2002.    [Non-patent reference 2] D. Kotake, K. Satoh, S. Uchiyama, and H. Yamamoto, “A hybrid and linear registration method utilizing inclination constraint”, Proc. 4th IEEE/ACM Int'l Symp. on Mixed and Augmented Reality (ISMAR 2005), pp. 140-149, October 2005.    [Non-patent reference 3] Oshima, Yamamoto, and Tamura, “A Mixed Reality System with Visual and Tangible Interface Capability—Application to Evaluating Automobile Interior Design”, Transactions of the Virtual Reality Society of Japan, vol. 9, no. 1, pp. 79-88, 2004.    [Patent reference 1] Japanese Patent Laid-Open No. 2000-353248
A physical object and a virtual object that simulates it do not have completely matching shapes and positional relationship. Hence, as shown in FIG. 3, a hand 180 as a physical object and a virtual object 310 are not displayed in a completely superimposed state (they appear to have a shift).
Assume that when a virtual object 702 that expresses the interior of a physical object 701 is superimposed on the physical object 701 and presented to an observer as a stereoscopic vision, as shown in FIG. 7, fusion by the observer occurs with focus on the virtual object 702. In this case, fusion may also occur on the surface of the external physical object 701.
Conventionally, when the interior of the object is visible in the physical space, the surface of the physical object 701 on the near side should be invisible or should be perceived as a semitransparent object. However, the fusible virtual object 702 exists behind the physical object 701 that is perceived as a completely opaque object. For this reason, the observer's binocular function is going to simultaneously fuse the cubic edge of the virtual object 702 on the far side and that of the physical object 701 on the near side. This phenomenon gives unnatural binocular rivalry to the observer and produces a sense of incongruity.
This will be explained using a detailed example.
FIG. 8 is a view showing an example of an image that superimposes a virtual object of an internal structure on the mock-up of the camera in FIG. 5 described in non-patent reference 2.
An image 801 is obtained by superimposing a virtual object of the internal structure of a camera on the mock-up of the camera shown in FIG. 5. An image 802 is an enlarged view in a frame 899. In the image 802, an edge 804 near the shutter of the mock-up that is a physical object is located close to an edge 803 of a gray component on the far side. However, when these objects having different depths are presented in a stereoscopic vision, the observer may have the above-described sense of incongruity.