When sound is recorded in a room, a signal recorded by a sound capturing endpoint or sound capturing device such as a microphone typically includes two components. One component is normally referred to as direct energy and the other as reverberant energy.
The direct energy is transmitted to the sound capturing endpoint directly from one or more audio sources without being bounced or reverberated by walls. The audio source may be anything producing sound, such as a speaking person, an instrument being played by someone, a loudspeaker controlled by a playback device and the like.
The reverberant energy is also produced by the sound source. However, this reverberant component is captured after it has bounced off an object such as a wall at least one time. During the travelling of sound, the amplitude of the sound is attenuated. Also after bouncing off an object such as a wall, some frequency bands of the sound are absorbed (partially) by the surface of the wall, changing the spectrum of the reverberated sound. Considering that the spectrum and the arrival time of the reverberated sound at the sound capturing endpoint may be rather different from those of the directly transmitted sound; it is beneficial to obtain the two components for later processing, for example, for reflecting diffusivity for the sound source.
Existing methods to estimate the reverberant energy component from the audio source and generate spatial features for the audio source usually rely on prior knowledge or estimations of properties of the room such as reverberation time (RT60), which is the time required for reflections of a direct sound to decay 60 dB, or absorption coefficients of the walls. As a result, the existing methods are time consuming and not practical in reality since prior knowledge about the room acoustics is normally absent.
In view of the foregoing, there is a need in the art for a solution for estimating reverberant energy component from an active audio source for improved precision, repeatability and speed.