Many devices have been created to deliver optical information to the human eye. Visual information can take the form of high definition video, computer generated content, two and three dimensional content, text, etc. The visual component of a virtual reality system delivers synthetic content directly to the eye, whereas augmented reality systems blend generated content with real world views.
In nature, every illuminated particle reflects or emits rays of light in every direction and in a multitude of wavelengths. The rays that reach us from afar are nearly parallel and those that arrive from a nearby point are more divergent. The arriving beams that pass through our pupils are focused, or made more convergent, as they pass through the cornea, the aqueous humor, the crystalline lens, the vitreous humor and finally, arrive at the retina.
For normal vision, an image will be formed on a portion of the retina that is dependent on the entrance angle of the beam with respect to the optical axis of the eye, or direction of gaze. Those images that form in the central 2 degrees of vision fall on an area of the retina with an exceptionally high density of photoreceptor cells called the fovea. It is here that most of the high resolution visual information is converted from optical to electrical nerve impulses via the photoreceptors, and transmitted to the visual cortex via the optic nerve bundle. Photoreceptors further away from the fovea detect off axis images and contribute to the sense of peripheral vision. In total, there are approximately 15 million rod cell and cone cell photoreceptors. Rod cells detect low levels of light, but no color, and cone cells detect color, but at higher levels of light intensity. Three types of cone cells sensitive to red green and blue light, are predominantly found in the high density central area of the retina, thereby providing high resolution color vision.
Because central vision contains so much more information, the eye will rapidly “scan” or saccade when fixating on an important target, say a face or moving object, and jump to another at a rate of up to 1000 Hz. The eye can also “jitter” or micro saccade to provide continuous sensitization to the retina. The eye can rotate up/down and left/right about a central point at a speed of up to 900 degrees per minute. Although the eye can rotate in excess of 50 degrees in various directions, depending upon age, individuals rarely exhibit eye motions exceeding plus or minus 10 degrees from a straight ahead gaze.
An eye, with a fixed forward gaze, can detect light impinging on the cornea from and angle of nearly 110 degrees towards the temple, and about 59 degrees towards the nose. The field of vision also extends to approximately 56 degrees above and 70 degrees below the direction of gaze.
Monocular vision can provide moderate depth of field cues through motion parallax, kinetic rotations, shadows, familiar size, occultation, perspective, and accommodation or focus, to name a few.
Binocular vision allows for a wider field of view, improves acuity due to detail averaging between two images and provides visual cues for a much stronger sense of 3D depth perception. The primary binocular cues are stereopsis and the vergence-accommodation reflex. Stereopsis gives a sense of depth by processing slightly different left and right images that fall on the retinas. Although both eyes may converge on the same point of a 3D object, if that object is closer than 10 meters, then its shape, volume and shadows, having points nearer and further than the point of convergence, will project to slightly different horizontal positions on each retina. This slight, horizontal differential displacement, or “binocular disparity”, is due to the horizontal parallax induced by eye separation, and is sensed by dedicated cells called “binocular cells”, that are horizontally arranged near the center of vision. While vertical displacements due to shapes and shadows alone are also perceived, they are less impactful. All disparity information is then sent to the visual cortex, where the two images are fused as one, and some measure of depth is realized. A relative, rather than an absolute depth may be sensed in this way.
For objects closer than 2 meters, precise depth information is extracted via the vergence-accommodation reflex. For a close object, an approximate estimation of distance is perceived by stereopsis, thereby triggering an involuntary, simultaneous occurrence of three events; the eyes converge to a point of fixation on the object, the ciliary muscles contract which thickens the crystalline lens and increases its focusing power, and the pupils constrict, which improves the depth of focus. An increase in focusing power brings the more divergent rays of a close object to a sharp focus on the retina. The amount of effort to achieve a good focus is observed by the proprioceptive sensors of the ciliary process, and is relayed to the visual cortex that derives a precise interpretation of focal distance. Similarly, kinesthetic information from the extraocular muscles that moved the eyes into a specific angle of convergence, coupled with a gaze angle, allows the visual cortex to extract a very precise distance via triangulation.
Finally, the vestibulo-ocular reflex is an interaction between the vestibular system that provides balance, spatial orientation and acceleration information, and the extraocular muscles that move the eyes about three axes of rotation. A movement of the head in one direction causes a reflexive counter move of the eyes in the opposite direction, thereby maintaining a stable image at the center of the visual field. This allows for fixed targeting of a stationary object during body motion or stable targeting of a moving object.
A typical movie projector produces a focused image on a curved or flat screen at a distance. A curved screen helps to improve the sense of immersion with a modest increase in peripheral vision. In both cases, the distant screen provides reflected parallel light beams that can easily be focused by the human eye, but lends little parallax or binocular information.
Viewing a distant screen with “3D” glasses can provide a sense of depth. These devices utilize various techniques to deliver a slightly different view angle to each eye. Most are limited by frame rate, brightness, and the production of a truly divergent ray field that a near object would produce. And of course, they are all subject to the flat field, limited resolution, limited dynamic range and limited angular extent of the distant screen. An improvement in field of view occurs when moving a screen closer while using 3D glasses. Although closer, the depth of focus remains constant and relaxed distant focus is lost. The field of view is also a small subset of the visual potential.
Additional information content can be added by a “heads up” display whereby information is projected on the surface of a visor or screen. Using a combination of scanners and optical elements, a virtual image can be produced at any apparent depth, but is usually limited by a narrow angle of view. Such information may overlay the true visual field. The overlay of a computer generated, or other video source on a true direct view of a scene falls in the realm of augmented reality.
Current Virtual Reality, Augmented Reality, and Mixed Reality systems attempt to provide a multitude of visual cues, including motion stabilized imaging, binocular vision, and a few discrete focal planes to give a better sense of realism. Most provide a modest field of view, and are limited in delivering continuous, truly divergent fields that are ubiquitous in the real world. These head mounted systems often have a bulky form factor, and are hard wired to a power source, a data processing unit, or a personal computer. More advanced models move image processing, wireless communications, and battery power onto the headset. A number of devices also incorporate motion sensors, outward looking cameras, external sensors to track one's movements, and inward looking cameras to track eye position. Recent mobile VR/AR/MR offerings have raised social concerns about privacy and obtrusiveness.
Prior art teaches many methods for determining the position of the pupil relative to the head. A commonly used form of gaze sensor consists of a remote or head mounted source of Infra-Red light that is projected towards the eye and a remote or head mounted camera that can observe the pupil position or the resulting reflection patterns from the cornea.
AR systems also suffer from limited control of the lighting environment. The real scene is directly passed through to the observer via a transparent screen and synthetic images are then overlaid on that scene. It is generally an additive process yielding translucent images. A problem occurs when attempting to overlay a dark simulated object onto a bright real background. When a beachgoer gazes through an AR headset and looks to the bright horizon, it is not possible to observe the overlay of a black containership. Nor is it possible to accurately control shadows.
In general, current devices are hampered by their inability to sufficiently synchronize precise head motions with stabilized imagery, producing a disturbing visual lag. This can be attributed, in part, to sensor deadband issues, software computation delays, digital content protection, and LCD switching speeds. What is observed does not agree with what motion, if any, is sensed by the vestibular system. Further, stereopsis cannot be fully achieved unless a true 3D image is presented to the eyes. In addition, virtual systems that do not synchronize binocular vision with natural depth of field cues create a vergence-accommodation conflict. All of these sensory conflicts can negatively affect the human vestibular and ocular systems resulting in disorientation and what is termed “virtual reality sickness”.
Finally, systems that lack eye tracking capabilities are incapable of dynamic data allocation that can efficiently address the greater needs of central vision. Thus, systems of this type uniformly distribute their data bandwidth over the entire visual field requiring a greater computational load for a given resolution.