A human being's vision system recognizes relative positions of objects in visible space by using a great deal of depth cues. Such cues may be classified into two categories, namely, a physiological factor and a psychological factor. The physiological depth cues include accommodation, convergence or vergence, binocular disparity, and motion parallax, while the psychological depth cues include linear perspective, shade, atmospheric perspective, occlusion by a different object, texture gradient, and color.
Among the physiological depth cues, accommodation refers to changing a focal length of a crystalline lens when a viewer's eyes are to be focused on a particular area of a three-dimensional (3D) scene. The change in the thickness of the crystalline lens is generated due to a change in the tension of muscle of a ciliary body. In the human being's vision system, it is normal to use accommodation in conjunction with convergence. Convergence or vergence refers to a phenomenon whereby when a viewer (or an observer) gazes at a point at a finite distance, his two eyes rotate inward and cross at a fixation point. Binocular disparity, based on the fact that left and right eyes are separated by about 65 millimeters from each other and receives different images, refers to the difference between images projected to the left and right retinas when the viewer is viewing a 3D scene. The binocular disparity is definite depth cue used in depth sensing or stereopsis by the vision system. Motion parallax refers to a relative displacement difference of respective points in the 3D scene (namely, a closer point moves more actively than a distant point) when there is a relative motion between the observer and the 3D scene.
Research into a 3D television has been actively conducted in order to display a 3D image by utilizing such a visual perceptual mechanism of human beings. Various 3D image display schemes have been proposed, and the most prominent one among them, in terms of technical feasibility and 3D effect display capability at the time of filing the present application may be the stereoscopic scheme. In a stereoscopic 3D display system, different images are captured by using two image sensors separated by about 65 millimeters from each other like human eyes, and a display device allows the two images be separately provided to the user's left and right eyes, to thereby simulate the binocular disparity to allow for depth perception or stereoscopic vision (or stereopsis). Two image sensors are aligned in a horizontal direction such that they have the same optical characteristics, focal length, and zoom magnification.
However, the stereoscopic 3D image is different in some aspects from an image a human can actually perceive.
One difference is the discrepancy in focusing and convergence. In more detail, when a human actually gazes at a certain object, both eyes are converged to a fixation point on the gazed object and focus on the fixation point. In comparison, when he views a stereoscopic image, a different situation occurs. A camera capturing an image focuses on a particular object, and accordingly, the focus of a pair of two stereoscopic images is adjusted on the basis of a virtual stereoscopic window plane on which the object is positioned. When displayed on the display device, the pair of stereoscopic images is focused on a physical image display plane (referred to as a ‘stereoscopic screen’, hereinafter). Accordingly, a convergence stimulus naturally changes over the depth, while a focusing stimulus has a tendency of maintaining a state of being fixed to the stereoscopic screen.
Thus, the human's eyes constantly focus on the stereoscopic screen, while the fixation point is at the front or rear of the stereoscopic screen depending on the position of the gazed object, causing a situation in which the human eyes are converged to a depth plane different from the stereoscopic screen. Also, although the pair of stereoscopic images is produced to be focused on the basis of points on the stereoscopic screen, some humans' eyes attempt focusing on an object, as a fixation point, in front of or behind the stereoscopic screen, to which the focus is not precisely adjusted.
Humans may tolerate a slight focusing-convergence discrepancy, but if this is excessively increased, an image is not focused or stereo image synthesizing is not properly performed. In more detail, as shown in FIG. 1, when the fixation point is positioned on the stereoscopic screen and is consistent with points (PL, PR) corresponding to a left image and a right image, the horizontal parallax is 0 and cues do not collide. As shown in FIG. 2, when the fixation point is positioned behind the stereoscopic screen, the horizontal parallax has a positive value, so in this case, although the cues slightly collide, stereoscopic images can be synthesized to have a stereoscopic depth effect by the binocular disparity without causing great tension. Meanwhile, as shown in FIG. 3, when the fixation point is positioned in front of the stereoscopic image, the horizontal parallax has a negative value and the human vision cross in front of the screen, causing considerable tension in the eyes. The human eyes can tolerate a negative parallax value to an extent and merge the pair of stereoscopic images, but when the negative parallax value exceeds a certain value, it is known that the image collapses or is seen as two images, making the viewer feel uncomfortable.
As a more serious problem, among those caused by the negative parallax, is a collision between cues may occur when the object having a negative parallax value is partially covered in the vicinity of left and right corners of the pair of stereoscopic images.
FIG. 4 shows a situation in which a first camera having a lens 10 and an image sensor 12 captures a left image projected to a first stereoscopic window 14 and a second camera having a lens 20 and an image sensor 22 captures a right image projected to the second stereoscopic window 24. It is assumed that first to third objects 30, 32 and 34 are included on the first and second stereoscopic windows 14 and 24. FIG. 5 shows an example of left and right images 14a and 24a displayed on the stereoscopic plane. The left and right images may be synthesized to have a stereoscopic depth effect by binocular parallax to display first and second objects 30 and 32, each having a zero parallax and a positive parallax. The two images may be synthesized by the stereoscopic cues to provide a stereoscopic image of the third object 34 having a negative parallax to the observer (or viewer). However, as the third object 34 is cut by the left corner of the right image, another depth cue called ‘occlusion by an object’ takes effect, and accordingly, the user may recognize as if the object was positioned behind the stereoscopic plane, namely, the display device.
Such a collision of cues is called ‘edge violation’, which causes viewer inconvenience and confusion and potentially significantly degrades the quality of the 3D image. The collision of cues caused by the object's partial occlusion may partially result from a partial blind spot generated as the second camera fails to cover the left portion of the viewing angle of the first camera. Meanwhile, the same problem also arises when the object is covered (occluded) by the corner of the right image.