The visual system makes use of a large variety of correlated cues to reconstruct the three-dimensional (3D) world from two-dimensional (2D) retinal images. Conventional stereoscopic displays present conflicting cues to the visual system, compromising image quality, generating eye fatigue, and inhibiting the widespread commercial acceptance of stereo displays. In order to understand these conflicts, it is helpful to first review some of the visual processes involved with stereoscopic viewing.
When a person looks directly at an object, the person's eyes fixate upon it, so that its image is directed to the fovea of each eye. The fovea is the area of the retina with the highest spatial resolution (˜120 cones per visual degree). The oculomotor process of vergence controls the extent to which the visual axes of the two eyes are parallel (to fixate on a very distant object) or rotated toward each other (i.e., converged) to fixate on closer objects. For example, as shown in FIG. 1A, if eyes 32a and 32b are fixated upon a house 30 that is in the distant background of a scene 40, its image falls on fovea 34, which is about at the center of retina 36 of each eye, as indicated by solid lines 38. The viewer's fixation can be shifted to a tree that is in the foreground of the scene, causing the eyes to converge (i.e., rotate more toward each other), as shown in FIG. 1B, so that the tree's image now falls on the fovea of each eye, as indicated by dotted lines 42. The visual system receives feedback from the muscles used to rotate the eyes, which provides a cue to the viewing distance of the fixated object, i.e., when viewing a nearer object with both eyes, they will be converged toward each other more than when viewing a more distant object in a scene. The vergence system also interacts with the process of stereopsis.
Because a small distance separates the two eyes, they have slightly different viewpoints, and therefore form different retinal images of a scene. In stereopsis, the visual system compares the images from the left and right eye, and makes powerful inferences about the viewing distance of objects based on binocular disparities in the retinal images in each eye.
The image of a fixated object falls on the same portion of the retina of each eye (the center of the fovea). Other objects at approximately the same viewing distance as the fixated object will also fall on corresponding points of the retina (e.g., images of the other objects might fall about 1 mm to left of the fovea in each eye). The horopter is an imaginary curved plane that describes the area of space in a scene being viewed that will produce images of objects in that area of space falling on corresponding retinal points. Objects behind the horopter will create retinal images shifted toward the left side of the retina in the right eye and shifted toward the right side of the retina in the left eye (i.e., the images in both eyes are disposed more toward the nose). Objects in front of the horopter will created retinal images shifted toward the right side of the retina in the right eye and shifted toward the left side of the retina in the left eye, i.e., images in both eyes will be disposed more toward the viewer's ears. (FIG. 9 illustrates an example of the horopter.)
It should be emphasized that when a person views a scene stereoscopically and shifts fixation between objects that are disposed at different viewing distances, the vergence angle of the eyes necessarily changes. This change in vergence angle occurs even when artificially generated stereoscopic images are viewed using a 3D display apparatus.
Another oculomotor process, accommodation, governs the shifting of focus in the eye. Like a camera, the eye has a limited depth of focus. In order to form clear images on the retina of objects at different viewing distances, the eye must be able to adjust its focus. The eye possesses a two-part optical system for focusing on objects at different distances from the viewer. The cornea provides the majority of refraction (approximately 70%), but its refractive power is fixed. The crystalline lens is disposed behind the cornea, and its shape can be altered to increase or decrease its refractive power.
When the eye is in an unaccommodated state, the crystalline lens is flattened by passive tension from the zonular fibers, which are attached radially from the edge of the lens to the ciliary body on the wall of the globe. When the annular ciliary muscle is contracted, the tension in the zonular fibers is reduced, causing the curvature of the lens surfaces to increase, thereby increasing the optical power of the lens.
When a fixated object is close to the observer, the ciliary muscles of the eye contract, making the crystalline lens more convex, increasing its refractive power, and bringing the image of the object into focus on the retina. When a fixated object is far from the observer, the ciliary muscles of the eye relax, flattening the lens, and decreasing its refractive power, keeping the image in focus on the retina.
Dioptric blur provides negative feedback used by the accommodation control system when trying to bring an object at a given distance into focus. If a person fixates at an object at a new viewing distance, the object will initially appear blurred from the currently inaccurate state of accommodation. If the system begins to shift accommodation in one direction, and the object becomes blurrier, the system responds by shifting accommodation in the opposite direction. The brain receives feedback about the state of activity of the ciliary muscles, providing data about the viewing distance of the fixated object.
In natural vision, the amount of accommodation required to focus an object changes proportionally with the amount of vergence required to fixate the object in the center of each eye. Given this strong correlation, it is not surprising that accommodation and vergence mechanisms are synkinetically linked (an involuntary movement in one is triggered when the other is moved, and vice versa). This linkage can be observed in infants between 3 to 6 months of age, suggesting a biological predisposition for the synkinesis. When the eye accommodates to a given viewing distance, the vergence system is automatically driven to converge to the same viewing distance. Conversely, when the eye converges to a given viewing distance, the accommodation system is automatically driven to accommodate to the same viewing distance. These cross couplings between accommodation and vergence are referred to as convergence-driven accommodation, and accommodation-driven vergence.
Under natural viewing conditions, the visual system steers the eyes and gauges the distance of objects in the environment using many correlated processes—such as the linked processes of accommodation and vergence. However, as shown in FIG. 2, conventional stereo displays force viewers to decouple this linkage, by requiring the viewer to maintain accommodation on a fixed plane (to keep the 2D display surface in focus) while dynamically varying vergence angle to view virtual objects at different stereoscopic distances. In FIG. 2, a conventional stereo display 44 provides a left eye 2D image 48a that corresponds to the view of a scene 44 with a tree 46 (in the foreground) and house 30 (in the background). Similarly, the stereo display provides a right eye 2D image 48b of tree 32 and house 30, from the slightly different angle of view of the right eye compared to the view of the scene through the left eye. The viewer's left eye 32a sees only 2D image 48a, while right eye 32b sees only 2D image 48b, which produces the sensation of viewing objects at different distances due to the binocular disparities between the retinal images in the right and left eyes. As a viewer shifts his or her gaze from the tree 32 to the house 30, he or she must change the vergence angle of the eyes. However, since 2D images 48a and 48b are at a fixed distance from the viewer's eyes, the tree and house are both in focus when the viewer's eyes focus on 2D images 48a and 48b, which are planar and at a fixed viewing distance. Thus, the vergence afforded by the different 2D images does not have a corresponding accommodation affordance for the different viewing distances of tree 32 and house 30 in the actual scene, because both the tree and the house are in focus at the same distance (the distance to images 48a and 48b from the viewer's eyes), instead of at different accommodation viewing distances. This decoupling of vergence and accommodation is thought to be a major factor in the eyestrain associated with viewing stereo head-mounted displays (HMDs) and might lead to visual system pathologies after continuing long term exposure.
Accordingly, it would be desirable to display 3D images in which accommodation and vergence remain appropriately coupled so that a viewer is not subjected to eyestrain and can view such images for relatively long periods of time. Such 3D images can be viewed in a natural manner so that the brain can readily interpret the viewing distances of different objects within a scene in response to the normal vision perception of vergence and accommodation, as well as the stereoscopic effect—unless viewed as a monocular image with only eye. A viewer's eye should not be forced to interpret object viewing distance in a 3D image based only on a stereoscopic effect that produces the different vergence demands, but does not afford corresponding accommodation levels when viewing objects in the image that are intended to appear at different relative viewing distances.
In addition, the appropriate coupling between vergence and accommodation should extend over a maximum range of focal distances for viewing objects that extends from the near field (e.g., less than 7 cm), to the far field (at infinity). Also, unless a foreground object is intended to be partially transparent, it would be desirable for the foreground object, which is in front of a background object, to fully obscure the background object, where the two objects overlap in an image. Some conventional imaging systems can provide 3D images in which vergence and accommodation remain coupled, but these prior art systems either fail to provide more than a very limited range of accommodation cues (i.e., a limited range of focal distance) for objects in a 3D image of a scene, and/or they operate in a manner that causes background objects to be visible through an overlapping foreground object that should be opaque. Clearly, a better approach for displaying true 3D images would be desirable.