Virtual reality is computer-generated simulation of a three-dimensional environment in which a user's perspective is dynamically alterable while the user interacts with and observes the environment. Although there are numerous possible ways for a user to interact with a virtual environment, typical interaction includes observing the environment by viewing and listening to the virtual environment. A user typically perceives the virtual environment as if the user were looking through a camera viewfinder or at an image being received from a video camera and displayed on the computer screen. In a standard graphical user interface, the image will be displayed in a window on the computer screen, although stereoscopic goggles and other viewing devices are also used.
Virtual environments are not real; they are data constructs stored in a computer that are viewed using a rendering engine, a computer process for rendering or drawing an image of the virtual environment data constructs on the computer screen. Various file formats and data structures have been developed for virtual environments, but VRML (Virtual Reality Modeling Language) has become a commonly recognized standard in the field, particularly for virtual worlds which are accessed over the World Wide Web.
VRML is a computer language for creating VRML scenes, virtual worlds comprised of three-dimensional objects. The primary data object in VRML is a node and there are several different types of nodes in VRML. A node can be used to define a particular three-dimensional object, such as a Sphere node or a Cube node. A node can also define characteristics of nodes defined subsequently in the scene graph, (the data file of a VRML world), such as a Material node which defines the surface material properties of subsequent shape nodes. Additional VRML nodes used by the rendering engine include, for example, PointLight (omni-directional light source) and LOD (varying level of detail representations for a single object depending on the perceived distance) nodes.
Objects in a VRML world can have several different properties, including the ability to generate sound (i.e., act as a source of audio signals). In particular, a Sound node defines sound generation properties in VRML, such as the location of the sound source, and the direction, intensity, and effective angle of the generated sound. These generated sounds are typically heard by the user through stereo speakers positioned alongside the computer video screen, although headphones and sophisticated multichannel sound systems are also available.
VRML scene graphs are sequential listings of VRML nodes that, when properly rendered, generate virtual worlds. Properly rendering a VRML world requires a special web-browser or application such as Silicon Graphics WebSpace or CosmoPlayer. These applications interpret the VRML scene graph and render a complex three-dimensional world, providing the user with a virtual camera looking into the virtual world.
The virtual cameras of the rendering engines are not limited to a single fixed length lens. Different virtual camera focal lengths may be selected by the VRML scene author or, alternatively, a virtual camera with an adjustable focal length may be used. In much the same way as a photographer or videographer can "zoom in" to obtain a magnified but narrower image by increasing the focal length of the lens, (e.g., using a telephoto lens), a user's view through the virtual camera can be "zoomed in." Similarly, a user's view of the virtual world can be expanded if the virtual camera "zooms out," much as a photographer's or videographer's view changes when a wide angle lens with a relatively short focal length is mounted on the camera.
These rendering engines do not just render a three-dimensional visual environment, they also render a three-dimensional aural environment. The single-channel (i.e., monophonic) sounds generated by the sound emitter nodes are rendered stereophonically to simulate three-dimensional positioning of the audio source. Simulation of three-dimensional audio positioning is a complex process typically involving convolution of Head-Related Transfer Functions and appropriate interaural amplitude and delay values. Although the exact techniques used are dependent on the specific rendering engine, the effect sought is identical: to make the sounds heard by the user appear to come from the appropriate location in the three-dimensional environment being rendered.
In order to provide a rendering engine with a location of an audio source that is consistent with a user's visual perception of the audio source's location, (i.e., coordinate the user's acoustic perspective and visual perspective) it is often not enough to merely provide the audio source coordinate location in the virtual environment. Because the focal length and field of view of the virtual camera is variable while the user's focal length and field of view are relatively fixed, the user's field of view and the virtual camera field of view will often be unequal. When this occurs, the visually perceived location of an audio source may be distorted by differences between the respective fields of view, especially at the boundaries of the field of view.
Moreover, because objects outside of the field of view can be audibly perceived as having a location relative to the viewed objects, any differences between user and virtual camera fields of view can distort the apparent location of these objects, even though they are not visually perceived by the listener at that instant. In addition, because of the primarily visual nature of human perception, human beings frequently tune out sounds in which they are not presently interested. This phenomenon is readily observed at a cocktail party or other large gathering where a person is able to conduct a conversation with another person they are looking at even though the volume of background noise (e.g., other conversations) is as high as (or higher than) the conversation.
Virtual reality is not experienced solely by sight, and hearing must be considered when creating a virtual environment. Therefore, it is necessary to account for the interaction between seeing and hearing in human beings before sounds can become an integrated part of a virtual world. Proper positioning of audio sources in three-dimensional virtual environments therefore requires consideration of how the audio sources are both visually and aurally perceived. However, there are a wide variety of techniques for simulating individual sound sources in three-dimensional environments, many of which are uniquely adapted for particular conditions, equipment, rendering engines and modeling languages. Accordingly, it is desirable to provide a generic solution that addresses the perceived dislocation in audio sources and is readily adaptable for use in a variety of three-dimensional virtual reality environments.
It is therefore necessary to provide a technique for reconciling the aurally perceived location of an audio signal source with its visually perceived location by providing the virtual reality rendering system with coordinates that identify the appropriate location of an audio source in accordance with a user's visual perception of the virtual environment. In particular, it is necessary to correct distortions introduced by the disparity between the virtual camera field of view and a user's field of view. It is also desirable to account for differences in perception of off-screen and on-screen objects, as well as the diminishing acoustic relevance of objects in relation to their distance from the user's field of view.