Advances in computing technology have fostered a great expansion in computerized simulation of scenes ranging from rooms and buildings to entire worlds. These simulations create "virtual environments" in which users move at a desired pace and via a desired route rather than a course strictly prescribed by the simulation. The computer system tracks the locations of the objects in the environment and has detailed information about the appearance or other characteristics of each object. The computer then presents, or renders, the environment as it appears from the perspective of the user.
Both audio and video signal processing are important to the presentation of this virtual environment. Audio can convey a three hundred and sixty degree perspective unavailable through the relatively narrow field of view in which eyes can focus. In this manner, audio can enhance the spatial content of the virtual environment by reinforcing or complementing the video presentation. Of course, additional processing power is required to properly process the audio signals.
Various signal processing tasks simulate the interaction of the observer with the environment. A well known technique of ray tracing is often used to provide the appropriate visual perspective of objects in the environment, and the propagation of sound may be modeled by "localization" techniques which mathematically filter "digitized audio" (a digital representation of analog audio using periodic samples). Audio localization is filtering of an audio signal to reflect spatial positioning of objects in the environment being simulated. The spatial information necessary for such audio and video rendering techniques may be tracked by any of a variety of known techniques used to track locations of objects in computer simulations.
The image processing tasks associated with such simulations are well known to be computationally intensive. On top of image processing, the additional task of manipulating one or more high quality digitized audio streams may consume a significant portion of remaining processing resources. Since the available processing power is always limited, tasks are prioritized, and the audio presentation is often compromised by including less or lower quality audio in order to accommodate more dramatic effects such as video processing.
Furthermore, high quality digitized audio streams require large portions of memory and significant bandwidth if retrieved using a network. Audio thus also burdens either a user operating with limited memory resources or a user downloading information from a network. Such inconveniences reduce the overall appeal of supplementing a virtual environment with localized audio.
Audio information can, however, be represented in a more compact format which may alleviate some of the processing, memory, and network burdens resulting from audio rendering. The Musical Instrument Digital Interface (MIDI) format is one well known format for storing digital musical information in a compact fashion. MIDI has been used extensively in keyboards and other electronic devices such as personal computers to create and store entire songs as well as backgrounds and other portions of compositions. The relatively low storage space required by the efficient MIDI format allows users to build and maintain libraries of MIDI sounds, effects, and musical interludes.
MIDI provides a more compact form of storage for musical information than typical digitized audio by representing musical information with high level commands (e.g., a command to hold a certain note by a particular instrument for a specified duration). A MIDI file as small as several dozen kilobytes may contain several minutes of background music, whereas several megabytes of digitized audio may be required to represent the same duration of music.
MIDI does, however, require a processing engine to recreate the represented sounds. In a computer system, a sound card or other MIDI engine typically uses synthesis or wave table techniques to provide the sound requested. The MIDI commands are passed to the sound card. By doing so, the system does not perform a conversion of the commands to raw digital data which could be manipulated by the main processing resources of the system. The synthesized sound may also be mixed by the sound card with digitized audio received from the system and played directly on computer speaker system.
Thus, when MIDI sounds are played, the main processor does not have access to the raw digital data available when digitized audio is played. This precludes digital filtering by the main processor and prevents MIDI compositions from being manipulated as a part of the presentation of a virtual environment. This inability to manipulate MIDI limits the use of a vast array of pre-existing sounds where audio localization is desired. Additionally, the need to localize sounds using cumbersome digitized audio, rather then a compact representation such as MIDI, exacerbates the processing, storage, and networking burdens which impede further incorporation of sound into virtual environments.