Virtual environments are ubiquitous in computing environments, finding use in video games (in which a virtual environment may represent a game world); maps (in which a virtual environment may represent terrain to be navigated); simulations (in which a virtual environment may simulate a real environment); digital storytelling (in which virtual characters may interact with each other in a virtual environment); and many other applications. Modern computer users are generally comfortable perceiving, and interacting with, virtual environments. However, users' experiences with virtual environments can be limited by the technology for presenting virtual environments. For example, conventional displays (e.g., 2D display screens) and audio systems (e.g., fixed speakers) may be unable to realize a virtual environment in ways that create a compelling, realistic, and immersive experience.
Virtual reality (“VR”), augmented reality (“AR”), mixed reality (“MR”), and related technologies (collectively, “XR”) share an ability to present, to a user of an XR system, sensory information corresponding to a virtual environment represented by data in a computer system. Such systems can offer a uniquely heightened sense of immersion and realism by combining virtual visual and audio cues with real sights and sounds. Accordingly, it can be desirable to present digital sounds to a user of an XR system in such a way that the sounds seem to be occurring—naturally, and consistently with the user's expectations of the sound—in the user's real environment. Generally speaking, users expect that virtual sounds will take on the acoustic properties of the real environment in which they are heard. For instance, a user of an XR system in a large concert hall will expect the virtual sounds of the XR system to have large, cavernous sonic qualities; conversely, a user in a small apartment will expect the sounds to be more dampened, close, and immediate. Additionally, users expect that virtual sounds will be presented without delays.
Ambisonics and non-ambisonics, among other techniques, may be used to generate spatial audio. For a large number of sound source objects, ambisonics or non-ambisonics may be an efficient way of rendering spatial audio because of its design and architecture. This may especially be the case when reflections are modelled. Ambisonics and non-ambisonics multi-channel based spatial audio systems may render the audio signals through several steps. Example steps can include a per-source encode step, a fixed overhead soundfield decode step, and/or a fixed speaker virtualization step. One or more hardware components may perform the steps.
In a first method for rendering the audio signals, each sound source can have its own pair of finite impulse response (FIR) filters. In such systems, a perceived position of a sound is changed by changing filter coefficients of FIR filters. In some embodiments, each sound may use a plurality (e.g., two pairs) of FIR filters. Each pair may use two filters (i.e., four FIR filters). As sounds move around the virtual environment, the FIR filters can be crossfaded. In some embodiments, four FIR filters may be used for each sound.
In a second method for rendering the audio signals, virtual speaker panning may be implemented using a fixed number of virtual speakers. Each sound source may be panned across the fixed virtual speakers. In some embodiments, a plurality (e.g., two) FIR filters may be used for each virtual speaker. The virtual speaker panning may be efficient for certain applications and may use a negligible amount of computation resources.
In some embodiments, a certain method may have increased efficiency compared to the other method depending on the number of sounds playing concurrently. For example, 30 sounds may be playing concurrently. If four FIR filters are used for each sound source, then 120 FIR filters (30 sound sources×4 FIR filters per sound source=120 FIR filters) may be required for the first method. If 2 FIR filters are used for each virtual speaker, then only 32 FIR filters may be required for the second method (16 virtual speakers×2 FIR filters per virtual speaker=32 FIR filters).
As another example, only one sound may be playing. The first method may require only four FIR filters (1 sound source×4 FIR filters per sound source=4 FIR filters), while the second method may require 32 FIR filters (16 virtual speakers×2 FIR filters per virtual speaker=32 FIR filters).
As illustrated through the above examples, the first method may be beneficial for a small number of sounds, and the second method may be beneficial for a large number of sounds. Accordingly, an audio system and method that increased the efficiency based on the number of sound sources at a given time may be desired.