The geometric and visual complexity of scenes used in video games and interactive virtual environments has increased considerably over the last few years. Recent advances in visual rendering and hardware technologies have made it possible to generate high-quality visuals at interactive rates on commodity graphics processing units (GPUs). This has motivated increased focus on other modalities, such as sound rendering, to improve the realism and immersion in virtual environments. However, it still remains a major challenge to generate realistic sound effects in complex scenes at interactive rates. The high aural complexity of these scenes is characterized by various factors, including a large number of sound sources. Namely, there can be many sound sources in these scenes, ranging from a few hundred to thousands. These sources may correspond to cars on the streets, crowds in a shopping mall or a stadium, or noise generated by machines on a factory floor. Similarly, other factors may include a large number of objects. Many of these scenes consist of hundreds of static and dynamic objects. Furthermore, these objects may correspond to large architectural models or outdoor scenes spanning over tens or hundreds of meters. In addition, another factor may consider acoustic effects. Notably, it is important to simulate various acoustic effects including early reflections, late reverberations, echoes, diffraction, scattering, and the like.
The high aural complexity results in computational challenges for sound propagation as well as for audio rendering. At a broad level, sound propagation methods can be classified into wave-based and geometric techniques. Wave-based methods, which numerically solve the acoustic wave equation, can accurately simulate all acoustic effects. However, these methods are limited to static scenes with few objects and are not yet practical for scenes with many sources. Geometric propagation techniques, based on ray theory, can be used to interactively compute early reflections (up to 5-10 orders) and diffraction in dynamic scenes with a few sources [Lentz et al. 2007; Pelzer and Vorländer 2010; Taylor et al. 2012; Schissler et al. 2014].
A key challenge is to simulate late reverberation (LR) at interactive rates in dynamic scenes. The LR corresponds to the sound reaching the listener after a large number of reflections with decaying amplitude and corresponds to the tail of the impulse response [Kuttruff 2007]. Perceptually, LR gives a sense of the environment's size and of its general sound absorption. Many real-world scenarios, including a concert hall, a forest, a city street, or a mountain range, have a distinctive reverberation [Valimaki et al. 2012]. But this essential aural element, LR, is computationally expensive. Notably, using ray tracing in a typical room-size environment, calculating only 1-2 seconds of LR length requires the calculation of high-order reflections (e.g. >50 bounces) in moderately-sized rooms.
The complexity of sound propagation algorithms increases linearly with the number of sources. This limits current interactive sound-propagation systems to only a handful of sources. Many techniques have been proposed in the literature to handle multiple sources: sound source clustering [Tsingos et al. 2004], multi-resolution methods [Wang et al. 2004], and a combination of hierarchical clustering and perceptual metrics [Moeck et al. 2007], etc. to handle a large number of sources. However, a major challenge is to combine them with sound propagation methods to generate realistic reverberation effects.
A third major challenge in generating realistic acoustic effects is realtime audio rendering for geometric sound propagation. A dense impulse response for a single source generated with high order reflections can contain tens of thousands of propagation paths; hundreds of sources can result in millions of paths. Current audio rendering algorithms are unable to deal with such complexity at interactive rates.
Accordingly, there exists a need for systems, methods, and computer readable media for conducting interactive sound propagation and rending for a plurality of sound sources in a virtual environment scene.