As a method for providing an eye-contact function on a video conference system, there are mainly a physical method and an image composing method. Between the methods, the image composite method estimates a depth value from an image which is two-dimensionally obtained to project the image into a three-dimensional space and projects the projected image to a two-dimensional space in which the projected image is displayed again. A technique which becomes a basis of the image composing method is a method which estimates a depth value from the two-dimensional image and according to the related art, a technique which performs stereo-matching using two cameras or uses two image cameras and one depth camera to estimate a depth value is used.
As illustrated in FIG. 1, the stereo matching method is a method which uses a color image input in a stereo camera which is configured by two cameras to warp a specific reference point (pixel) of one image to another image to find a point (pixel) having the most similarity and a warping equation is obtained using inherent parameter values of two cameras. As illustrated in FIG. 2, a method using a depth camera uses a depth value as a reference value in order to perform the stereo matching, thereby improving precision of the depth.
According to a method using the stereo-matching of the related art, there is an occlusion region in an input image of a stereo camera in accordance with a view point and thus, a desired result for the occlusion region cannot be obtained when using the stereo-matching method. For example, as illustrated in FIG. 1, images which are input when a camera 1 photographs a subject are areas A and B and images which are input when a camera 2 photographs the same subject are areas B and C. In this case, the areas A and C except for the area B are occlusion regions which are not photographed by the camera 1 or the camera 2.