A continuing pursuit in the computer generated graphics and the multimedia industries is to make user interfaces more compelling, realistic and interactive to the user. Today, many application programs for desktop personal computers include a mix of two-dimensional (2D) graphics and audio. With enhancements in video compression techniques, full motion digital video is also becoming a more common feature in some user interfaces. While traditionally limited to expensive graphics workstations and special purpose hardware, real time three-dimensional (3D) graphics applications are now being introduced for the personal computer platforms. Producing realistic, interactive output of any of these media types, and especially, combinations of these media types requires a great deal of computational resources, namely, computational cycles of a microprocessor or rendering device. As such, there is demand for software and hardware tools that manage the use of these resources and improve the quality of the output of media types.
A significant limitation of creating more compelling and realistic graphical content as well as content that includes a mixture of different media types is the computational overhead of creating high quality output. For example, to create a sequence of real time 3D animation, a graphics rendering system has to convert complex 3D models into color intensity values for the hundreds of thousands (or even a million) of pixels that form each image in the animation sequence. To create the effect of realistic motion, a new image must be generated in only fractions of a second.
This demand for computational resources applies not only to rendering graphics, but also applies to rendering other types of multimedia content as well. To create truly realistic content, many media types such as 2D and 3D graphics are typically integrated into the output. In addition, in multitasking operating systems common in today's PCs such as the Windows.RTM. 95 Operating System from Microsoft, there typically many other application programs vying for limited computational resources. In addition, the amount of resources available and the amount of resources requested by different programs can vary widely during run-time.
The demand for computational resources results not only from generating the output, but also includes the overhead in transmitting the content. For example, it requires a large amount of data (e.g. pixel intensity values) to represent a picture digitally. The same applies for a large audio file or a combination of media types. Data compression techniques are often used to reduce the bandwidth required to transfer media content and the memory required to store it. Unfortunately, these compression techniques further add to the computational overhead of retrieving and generating multimedia content.
In view of the limitations of the computational resources in most media rendering systems, the quality of the output typically has to be sacrificed to produce more complex media content such as real time 3D animation. Conventional 3D graphics rendering systems are typically not flexible enough to allocate rendering resources to features of a graphics scene that are likely to have more impact on the overall quality of the output. As such, artifacts in the output images are likely to be more noticeable to the user.
In the field of 3D graphics, some studies have focused on how to selectively reduce the level of detail of objects in a scene in an attempt to reduce computational costs while optimizing quality of the output. See T. A. Funkhouser and C. H. Sequin, Adaptive Display Algorithm for Interactive Frame Rates During Visualization of Complex Virtual Environments. In Proceedings of SIGGRAPH 93, pages 247-254. SIGGRAPH, August 1993.
One drawback of this earlier work is that it fails to allocate computational resources based on the user's focus of attention or based on the perceived quality of the output. Nor does the work employ models of a viewer's attention and consider the relationship between attention and perceived quality. The focus of the user's attention is significant because the user is more likely to perceive an artifact in a feature of the output that he or she is focusing on. Reducing the complexity of a model in a 3D graphics scene will reduce the computational resources needed to render the object. However, reducing the complexity of a model without regard to whether the user is focusing on the object may shift rendering resources away from a part of the scene that is most critical, namely, the part that the user is focusing his or her attention.
The perceived quality is a separate concept from attention and specifically pertains to the user's perception of quality as opposed to some raw measure of quality or some estimate of the accuracy of a rendering approximation. For example, a rough estimate of the quality of a rendering of a scene is the level of geometric detail of the object's in the scene. The relationship between the geometric level of detail and the perceived quality is typically not a direct, linear relationship. In practice, this means that making small changes in the geometric level of detail may not result in a corresponding reduction in the perceived quality, but will reduce the computational overhead of rendering a scene.
To our knowledge, no one has applied the concept of visual search and attention from the field of cognitive psychology to determine how to allocate rendering resources to parts of a graphics scene in the field of computer-generated graphics. In addition, we know of no work in computer generated graphics that has used a measure of perceived quality to control the allocation of rendering resources to parts of a graphics scene.
Other relevant work is described in co-pending patent application Ser. No. 08/671,412 by Nathan P. Myhrvold, James T. Kajiya, Jerome E. Lengyel, and Russell Schick, entitled Method and System for Generating Images Using Gsprites, filed on Jun. 27, 1996, which is hereby incorporated by reference. This patent application discloses a layered rendering architecture where objects are rendered independently to separate image layers called sprites or "gsprites." This rendering architecture can render objects at different spatial resolution and can also render objects at different update rates. Rather than re-render an object for each frame, the rendering system re-uses image layers from previous frames by warping them to approximate the position of objects in the current frame. This patent application describes how rendering resources can be allocated by re-rendering sprites in a priority order based on a raw measure of the accuracy of the image warp. One specific measure of quality called "geometric error" measures how the screen coordinates of selected points of an object in a current frame differ in position from the warped points that result from warping a previously rendered sprite to the current frame. To allocate rendering resources, the rendering system queues objects for rendering in a priority order based on the geometric error of the image warp. The image layers with the largest geometric error have higher priority to the limited rendering resources, and therefore, get re-rendered first before other image layers where the image warp provides a better approximation.
Raw measures of image quality such as the geometric error provide an effective way to dynamically allocate rendering resources to parts of graphics scene. However, this technique does not measure the perceived loss in fidelity. Specifically, the geometric error is often not directly related to the perceived decline in quality. In addition, the geometric error does not take into account the probability that the user is focusing his or her attention on the part of the scene that has been degraded to reduce rendering overhead.