Conventionally, a mixed reality (MR) presentation apparatus is available. This apparatus superimposes an image of a physical world and that of three-dimensionally modeled CG (computer graphics) data, and allows the user to view as if an object (virtual object) rendered by CG data existed in the physical world.
This is an apparatus which comprises physical image sensing means (e.g., a video camera), CG image generation means for generating a CG image as if it were seen from a position where a physical image is sensed, and image display means (e.g., an HMD (Head Mounted Display) or monitor) that can composite and display both the images.
This apparatus also comprises visual axis position/orientation detection means for detecting the visual axis position and direction of the physical image sensing means (video camera) so as to correctly display the positional relationship between the CG and physical image even when the position and orientation of the visual axis of the physical image sensing means have changed.
The CG image generation means sets three-dimensionally modeled CG data on a virtual space which has the same scale as the physical space, and renders the CG as an object observed from the visual axis position and direction detected by the visual axis position/orientation detection means. When the CG image generated in this way is superimposed on the physical image, an image in which an image generated based on CG data is correctly set on the physical space even when the physical image sensing means observes from any visual axis position and direction can be displayed. Changes of the type and layout, animation, and the like of CG can be freely made in the same manner as general CG data. Another position/orientation sensor may be equipped to designate the position of CG data, so that CG data can be rendered at a location designated by the value of the position/orientation sensor. Conventionally, with this arrangement, the user holds the position/orientation sensor with a hand, and observes CG at the position/orientation of the sensor.
The physical image sensing means that senses an image of the physical space is, e.g., a video camera, and means for sensing an image in the visual axis direction of the camera and capturing it in a memory.
As an image display device that composites an image on the physical space and CG image, for example, an HMD (Head Mounted Display) is used. When the HMD is used in place of a normal monitor, and the video camera is mounted in the visual axis direction of the HMD, an image in a direction in which the observer faces can be displayed on the HMD. In addition, since CG data in that facing direction can also be rendered, the observer can experience a heightened feeling of immersion.
Note that the image display device may be an HMD called an optical see-through type which does not comprise any video camera and allows the user to see through a scene in front of the HMD intact. In this case, the physical image sensing means optically displays a scene in front of the HMD on the display device in place of video image sensing. In the HMD of this type, the scene in front of the observer can be directly seen through without any digital process, and a CG image can be superimposed on that screen.
The image display means in the MR presentation apparatus displays an image obtained by superimposing a physical image and CG image onto the aforementioned image display device.
As the position/orientation detection means, a magnetic position/orientation sensor or the like is used. When such sensor is attached to the video camera (or the HMD to which the video camera is attached), it detects the values of the position and orientation of the visual axis of the video camera. The magnetic position/orientation sensor detects the relative position and orientation between a magnetism source (transmitter) and magnetic sensor (receiver). For example, FASTRAK available from Polhemus Inc. (USA) or the like is known. This device detects the three-dimensional (3D) position (X, Y, Z) and orientation (Roll, Pitch, Yaw) of the sensor in real time within a specific area.
With the above arrangement, the observer can observe a world formed by superimposing physical and CG images via the HMD. When the observer looks around, the physical image sensing device (video camera) attached to the HMD senses a physical image, and the visual axis position/orientation detection means (position/orientation sensor) equipped on the HMD detects the position and orientation of the visual axis direction of the video camera. Based on these data, the CG image generation means generates (renders) a CG image viewed from that visual axis position and orientation to superimpose the CG image on the physical image.
A conventional, general MR presentation method merely superimposes a CG image on a physical image, and does not consider any depth ordering between an object which really exists and a CG object. For this reason, even when the observer puts a hand in front of the CG object, he or she cannot observe his or her hand, and still sees the CG object located behind the hand. FIGS. 2A to 2D are views for explaining such state.
FIG. 2A shows an example of a state in which an observer 200 who wears an HMD 201 stretches forth a hand 203 to a CG object 202.
An image which is presented by the HMD 201 to the observer 200 in the state shown in FIG. 2A is an MR image 204 shown in FIG. 2B. In FIG. 2B, a fingertip portion 205 which is supposed to be observed is indicated by the broken line for the descriptive convenience. The fingertip portion 205 should be seen in front of the CG object 202 according to the depth ordering between the CG object 202 and hand 203. However, in the prior art, since the image of the CG object 202 is merely superimposed on the physical image, it is undesirably displayed in front of the fingertip image.
In order to solve this problem, conventionally, an overlapping area of physical and CG objects is detected, and the CG object in the detected area is masked to allow the user to see the physical object (e.g., see Japanese Patent Laid-Open No. 2003-296759).
This technique comprises physical object detection means and CG masking means. For example, in order to correctly display an overlapping state of the hand and CG object, as shown in FIG. 2B, a display area of the hand in the physical image need only be detected. Hence, the physical object detection means can detect an area where the hand is sensed by checking if the color of each pixel of the physical image is approximate to a flesh color. FIG. 2C shows an image used to mask the hand area detected from the physical image.
The CG masking means can prevent the CG object on a corresponding portion by setting an image region where the physical object detection means detects that the hand is sensed in a stencil buffer of the CG object or in a minimum value of a depth buffer (Z-buffer). As a result, an image like a masked MR image 207 can be obtained, as shown in FIG. 2D.
The exemplified method has the following disadvantage: a CG object is not rendered on an area where the flesh color appears in the real image irrespective of its depth. This method is sufficiently effective in a situation where the position of the CG object is always located behind the hand, and is conventionally used. However, when the overlapping state of CG and physical objects must always be expressed correctly, the detection method using a color or the like as in the above example does not suffice as the physical object detection means, and depth information of the physical object viewed from the viewpoint of the observer must be correctly detected.
An MR presentation apparatus which uses a device for detecting depth information of the physical space as the physical object detection means is conventionally used (e.g., see Japanese Patent Laid-Open No. 11-331874). As the CG masking means in this case, means for setting a value obtained by this device in the Z-buffer of the CG object is used.
In the conventional system, a CG masking process is executed for all CG objects on the MR space. However, it is not preferred to apply hand masking to all CG objects depending on systems.
For example, some visual effect that temporarily flickers a CG object or emphasizes its color is given to emphasize and display that object, so as to put the observer on a given CG object displayed on the MR space in some cases. In such case, when a physical object occludes the CG object, the observer's attention cannot be drawn.
Also, when a GUI such as an information display panel which is always seen is to be displayed by CG, it must always be displayed without occlusion irrespective of the positional relationship with a physical object. However, when the physical object occludes the GUI, display disappears.
Hence, since the prior arts apply masking to all CG objects based on the occlusion relationship with a physical object, they cannot flexibly switch the presence/absence of masking processes for respective CG objects upon display according to the types and purposes of display of individual CG objects to be displayed.