1. Field of the Invention
This invention relates generally to the field of 3-D graphics and, more particularly, to a system and method for rendering and displaying 3-D graphical objects.
2. Description of the Related Art
The human eye is subject to many of the same optical phenomena as inanimate optical systems. In particular, for any given state of the crystalline lens, there exists a unique distance df at which objects appear maximally sharp (i.e. minimally blurred) and an interval of distances around df where objects have sufficient clarity. More precisely, the blurriness of objects as a function of distance from the lens varies smoothly and has a minimum at distance df. The interval of distances over which objects are sufficiently clear is commonly referred to as the depth of field. The depth of field typically increases with increasing focus distance df.
Muscles (the ciliary body) connected to the crystalline lens may exert pressure on the crystalline lens. The induced deformation of the lens changes the focus distance df. The extent of lens deformation in response to muscular pressure depends on the elasticity of the lens. The elasticity generally decreases with age. (By age 45, many people will have lost most of their elasticity: ergo bifocals). Thus, the range of focus distances df which the human eye can achieve varies with age.
The human visual system has two directionally-controllable eyes located at the front of the head as suggested by FIG. 1. The direction of gaze of an eye may be characterized by a ray that emanates from the center of the corresponding macula (the most sensitive portion of the retina) and passes through the center of the corresponding lens. Because each eye gathers a different view on the external world, the brain is able to create a three-dimensional model of the world.
There are brain control systems which control the orientation angles and the focus distances df of each eye. These control systems may be responsive to various sources of information including clarity and positional fusion of the images perceived by the right and left eyes. In FIG. 1A, the ocular rays intersect at point P. Thus, the image of point P will fall on the center of the perceived visual field of each eye, and the two views on the neighborhood of point P will be fused by the visual cortex into an integrated 3D entity. In contrast, because point Q lies inside the two ocular rays, the right eye perceives the point Q as being to the left of center and the left eye perceives the point Q as being to the right of center. Thus, the brain perceives two images of point Q. Similarly, because point R lies outside the two ocular rays, the right eye perceives the point R as being to the right of center and the left eye perceives the point R as being to the left of center. So the brain perceives two images of point R also.
Let dt1 be the distance of the right eye to the intersection point P, and dt2 be the distance of the left eye to the intersection point P as illustrated by FIG. 1B. If the brain control systems set the focus distance df1 of the right eye equal to distance dt1 and the focus distance df2 of the left eye equal to distance dt2, the fused image in the center of the field of view will appear maximally clear, and objects closer than and farther than the intersection point will appear increasingly blurry.
The brain control systems are programmed to strongly favor an assignment of focus distances that correspond respectively to the distances to the intersection point. For most people, it is somewhat difficult even intentionally to achieve focus distances df1 and df2 that are significantly larger than or small than the distances to the intersection point. However, this is exactly the trick that is required for proper perception of stereo video as suggested by FIG. 2A. To create the perception of a three-dimensional object at point P in front of a display screen SCR, the viewer must direct his/her eyes so that the ocular rays intersect at point P. The right ocular ray passes through P and hits the screen at position X1, and the left ocular ray passes through P and hits the screen at position X2. The screen pixels in the neighborhood of position X1 give the right eye""s view on the 3D object, and the screen pixels in the neighborhood of position X2 give the left eye""s view on the 3D object. The clearest perception of the 3D object is obtained if the viewer can focus her eyes beyond the intersection point P to the screen positions X1 and X2. In other words, the right eye should achieve a focus distances df1 equal to the distance of the right eye to screen contact position X1, and the left eye should achieve a focus distance df2 equal to the distance of the left eye to the screen contact position X2. Many viewers find it difficult (or impossible) to override the brain""s tendency to focus at the intersection point. Focusing at the intersection point P implies that the pixilated images in the neighborhoods of X1 and X2 will appear blurry, and thus, the 3D object generated at point P will appear blurry.
FIG. 2B illustrates the complementary situation where an object is to be perceived at point P behind the screen. Again the viewer directs her gaze so the ocular rays intersect at point P. In this case, the clearest perception of the object is obtained if the viewer can achieve focus distances smaller than the distances to the intersection point P, i.e. at screen positions X1 and X2 respectively. Again, if the viewer cannot overcome the tendency to focus (i.e. optically focus) at the intersection point P, the object will appear blurred.
When the viewer looks at some object which resides at the plane of the screen, the eyes intersect at some point on the screen, and the brain can do what it is accustomed to doing: i.e. setting the optical focus distances so they correspond to the intersection point. Thus, objects at (or near) the plane of the screen should appear sharp.
In the real world, the brain""s tendency to focus at the intersection point is beneficial and implies the following. As the viewer moves his/her eyes and the ocular intersection point approaches a physical object, the object becomes increasingly fused and increasingly clear at the same time. Thus, the brain is trained to interpret increasing clarity as a clue that the eyes are moving appropriately so as to lock onto an object, and decreasing clarity as a clue that the eyes are moving away from locking onto an object.
When the viewer is observing artificially generated objects in response to stereo video, the tendency to focus at the intersection point is disadvantageous. For example, if the user attempts to lock his eyes onto a virtual object in front of screen SCR, the object may become increasingly blurry as the ocular intersection point approaches the spatial position of the virtual object (assuming the eyes are initially directed at some point on the screen). This increasing blur may actually discourage the brain control system from converging the eyes towards the virtual object to the extent where image fusion can occur. Thus, the eyes may stop short of the place where the viewer could begin to see a unified object.
Thus, there exists a need for a graphics system and method capable of generating stereo video which allows users to more easily perceive virtual objects (or portions of objects) in front of and behind the screen surface.
A graphics system may, in some embodiments, comprise a rendering engine, a sample buffer and a filtering engine. The rendering engine may receive a stream of graphics primitives, render the primitives in terms of samples, and store the samples into the sample buffer. Filtering engine may read the samples from the sample buffer, generate video output pixels from the samples, and transmit the video output pixels to a display device. The display device presents the video output to a viewer on a two-dimensional screen surface.
In one set of embodiments, the rendering engine and the filtering engine may be configured to generate a stereo video signal whose frames alternate between frames intended for the right eye and frames intended for the left eye of the viewer. The viewer may wear special glasses (e.g. shutter glasses) synchronized with the stereo video signal so the right frames are gated to the right eye and the left frames are gated to the left eye during corresponding time intervals. The graphics primitives may represent a collection of objects in a world coordinate system. Rendering engine may alternately generate frames of samples from the perspectives of a first virtual camera and a second virtual camera. In one embodiment, the position and orientation of the virtual cameras are responsive to the viewer""s head and/or eye motions. In another set of embodiments, the rendering engine and filtering engine may be configured to generate a stereoscopic effect with two separate video signals targeted for two display devices respectively. Each of the display devices may be dedicated to a corresponding one of the viewer""s eyes. The first video signal may be generated from the perspective of the first virtual camera and the second video signal may be generated from the perspective of the second virtual camera.
The rendering engine may send primitives through a computational pipeline (or partition the primitives among a number of parallel pipelines) to render the primitives in terms of samples. At some stage in the pipeline, a blur value may be assigned to each sample based on a function of the sample""s z depth. The blur value determines how much blurring the sample is to experience in the filtration from samples to pixels applied by the filtering engine. A small blur value implies the sample gets filtered with a highly reconstructive filter, i.e. a filter whose spatial cutoff frequency is close to the anti-aliasing cutoff frequency corresponding to one cycle per two video output pixels. A large blur value implies the sample gets filtered with a filter whose spatial cutoff frequency is significantly less than the anti-aliasing cutoff frequency. In general, the spatial cutoff frequency of the filter used to operate on a sample decreases with increasing blur value.
The blur function may be configured with single valley to create a depth-of-field effect. For example, the blur function       B    ⁡          (      z      )        =                    (                  z          -          C                )            2                                (                      z            -            C                    )                2            +      1      
has a minimum at depth C. Thus, samples in the neighborhood of depth C will translate into pixels with minimal blur (i.e. high clarity), and samples far removed from depth C will translate into pixels with a large amount of blur. More generally, the amount of applied blur a sample experiences will depend on its depth displacement (zxe2x88x92C). Thus, virtual objects (or portions of virtual objects) will be blurred in the displayed video output dependent on their positions with respect to the depth C. It is noted that a wide variety of functional forms are contemplated for the blur function. The example above is given for the sake of discussion and is not intended to be limiting.
The rendering engine may be configured to receive sensor measurements which indicate (a) the distance of the intersection point of the viewer""s ocular rays with respect to the viewer""s eyes, and (b) the distance of the viewer""s eyes relative to the screen. The first distance is referred to herein as the eye-relative concentration depth. The second distance may be referred to herein as the screen-relative eye depth. The eye-relative concentration depth may be derived from measurements obtained by a pair of eye trackers fixed relative to the user""s head. For example, the eye trackers may be packaged as a single unit with the shutter glasses. The screen-relative eye depth may be measured by a head tracker. The eye-relative concentration depth and the screen relative eye depth may be used to compute a screen-relative concentration depth.
The rendering engine may be configured to dynamically update the blur function in response to motion of the viewer""s head and/or eyes. For example, in one embodiment, the rendering engine may track the instantaneous screen-relative concentration depth C(t) for the ocular intersection point based on the sensor measurements, and may dynamically adjust the blur function so its minimum is maintained at (or near) the instantaneous concentration depth C(t). Thus, virtual objects (or portions of virtual objects) that happen to reside in the depth neighborhood of concentration depth C(t) may appear relatively clear. More generally, virtual objects (or portions of virtual objects) may be blurred based on the extent of their depth displacement from C(t). The acquisition of sensor measurements and the computation of concentration depth C(t) may be performed at a sufficiently high rate so that the viewer does not perceive time-discontinuities in the depth-dependent blur. Furthermore, the concentration depth values C(tk) computed in response to sensor measurements at times tk may be smoothed (or interpolated) before being applied to the blur function update.
Thus, a viewer who is attempting to redirect his/her gaze at a first virtual object (say at the screen surface) to a second virtual object in front of the screen will notice the first virtual object getting more blurry and the second virtual object getting less blurry (more defined) as he/she converges the ocular rays closer to the second virtual object. This is the type of feedback that the viewer""s brain is accustomed to receiving when viewing real objects in the real world. Thus, the probability of successful convergence on (and perception of) the second virtual object is increased.
In various embodiments described herein, the graphics system is said to dynamically update a blur function based on an xe2x80x9cinstantaneousxe2x80x9d viewer concentration depth. The term xe2x80x9cinstantaneousxe2x80x9d is used with the understanding there may be time delays between (a) the time sensor measurements are acquired, (b) the time a viewer concentration depth based on those sensor measurements is available to update the blur function, (c) the time when the blur function update has been completed, (d) the time when the updated blur function has been applied to a frame of rendered samples, (e) the time when the frame of rendered samples has been translated into output pixels by the action of the filtering engine, and (f) the time when the output pixels are presented to the viewer through one or more display devices. As used herein, the term xe2x80x9cinstantaneousxe2x80x9d implies that these time delays are small enough so that the viewer is given the illusion of instantaneous tracking, i.e. the viewer does not perceive any significant adverse visual effects due to the time-delays.
In another set of embodiments, a graphics system may configured with a rendering engine, sample buffer and a filtering engine. The rendering engine may be configured to generate depth values and sample color vectors for a plurality of sample positions in a two-dimensional field, and assign chromatic distortion values to the sample positions based on data including the depth values and a concentration depth of a viewer. The sample color vectors may include a first color component (e.g. red) and second color component (e.g. green) for each sample position. The sample buffer may be configured to store the chromatic distortion values and the sample color vectors for the sample positions. The filtering engine may be configured to:
read the chromatic distortion values and the sample color vectors for the sample positions from the sample buffer;
compute a first pixel color for an output pixel by filtering the first color components of the sample color vectors in a first neighborhood of a first position in the two-dimensional field; and
compute a second pixel color for the output pixel by filtering the second color components of the sample color vectors in a second neighborhood of a second position in the two-dimensional field.
The distance of separation between the first position and the second position may be controlled by the chromatic distortion value.