Field of the Invention
The present invention relates to an image synthesis method and an image synthesis apparatus, and more particularly, to a technique suitable for use in synthesizing a real image and computer graphics in consideration of a subject region in the real image.
Description of the Related Art
The mixed reality (MR) technology for seamlessly merging virtual space created by a computer and real space is receiving attention, and is expected to be applied to various fields such as assembly support in which an operation procedure and a wiring condition are superimposed and displayed at the time of assembly and surgery support in which the internal condition image of a patient is superimposed on the body surface of the patient and displayed.
A geometric consistency between a virtual object and real space is important in allowing a user to feel that the virtual object is present in the real space. There are two types of geometric consistency in the mixed reality, consistency for making the coordinate system of real space and the coordinate system of virtual space conform to each other and consistency for correctly representing a positional relationship between a real object and a virtual object in the depth direction. A problem associated with the latter consistency is also referred to as an occlusion problem. In particular, the occlusion problem is crucial for a video see-through MR system that superimposes a virtual object on an image captured by a camera. The present invention deals with the latter consistency, i.e., the occlusion problem.
Japanese Patent Laid-Open Nos. 2005-107967 and 2005-228140 solve the occlusion problem by drawing a hand as a subject in front of a virtual object at the time of synthesis. It is statically shown that a hand is often located in front of a virtual object. By always drawing a hand image region in front of a virtual object so as to prevent a hand located in front of the virtual object from being hidden, a viewer does not feel unnatural.
Referring to Japanese Patent Laid-Open No. 2005-107967 (hereinafter referred to as Patent Document 1), the difference between an image captured as a background and an image obtained by capturing the background and a subject at the same time is extracted as the color information on the subject and a region having the extracted color information is set as a subject region.
Referring to 2005-228140 (hereinafter referred to as Patent Document 2), only a single image obtained by capturing a subject and a background at the same time is used to extract color information. This image is displayed on a screen and a user interface allowing a user to separate the subject and the background is provided. A user can set a subject region as intended with the user interface.
Referring to Kenichi Hayashi, Hirokazu Kato, and Shougo Nishida, “Depth Determination of Real Objects using Contour Based Stereo Matching”, Transactions of The Virtual Reality Society of Japan, Vol. 10, No. 3, pp. 371-380, 2005 (hereinafter referred to as Non-Patent Document 1), the data of a depth of a hand from a camera and the data of a depth of a virtual object from the camera are compared in units of pixels and a foreground region is drawn. As a result, the occlusion problem is more accurately solved. Non-Patent Document 1 determines a subject region as follows:
(1) A space image to be the background of a subject is stored in the form of a three-dimensional texture;
(2) A background image is rendered in accordance with the current position and orientation of a camera;
(3) The rendered background image and a current image are compared to calculate a differential region; and
(4) The differential region is determined as a subject region.
Referring to Patent Document 1, it takes time to capture two images, a background image and an image including a background and a subject that is a hand, which are used for the determination of a region of the hand. On the other hand, referring to Patent Document 2, a subject region can be determined using a single image including a background and a subject. However, it takes time to manually register color information which an operator uses to separate the subject region and a background region. In addition, knowledge and experience are required for the registration of the color information. Not everybody can easily do the registration.
In order to stably extract a subject region in real time from an image captured by a camera having a movable viewpoint, it is necessary to capture a plurality of images of a subject and a background from different viewpoints. The increase in the number of images increases the time taken to determine a subject region.
Referring to Non-Patent Document 1, it is necessary to store a background on which a subject will be probably displayed as a three-dimensional image. It is therefore necessary to capture in advance a background scene including no subject as an image used as texture data. The increase in the area of the background increases the time taken for a preliminary preparation for this image capturing. Like in the case disclosed in Patent Document 2, in the case disclosed in Non-Patent Document 1, knowledge and experience are required for the acquisition of the texture data and not everybody can easily do the acquisition.
If a person tries a system employing the above-described method with which a long time is required for the determination of a subject region at an exhibition, that is, in a situation where an exhibitor wants as many persons as possible to try the system in a short time, the exhibitor may not adjust the system for the person to prevent the increase in the time spent on the trying. However, if the adequate adjustment of the system is not performed for each person, noise occurs at the time of the extraction of a hand and the hand is not displayed in front of a virtual object. This prevents the person from experiencing an immersive feeling. It has been hoped that there is provided a method with which subject extraction information can be calculated in a short time.
The adjustment of color information disclosed in Japanese Patent Laid-Open No. 2005-228140 and the acquisition of texture data disclosed in Non-Patent Document 1 require a long preparation time. It is difficult to adopt such methods for a time-pressured scene.
The present invention provides an information processing apparatus capable of easily and rapidly determining a subject region in a captured image using a single image.