It often occurs that audio recordings comprise recordings of sound signals generated by a means for producing sound output, henceforth referred to simply as ‘speaker’. For example, when recording speech of a person, the sound signal generated by the speaker of a television or radio playing in the background may be recorded as well. In many cases, such an audio recording is not primarily made to record the sound signal generated by the speaker. Rather, the audio recording may be directed at another sound signal, e.g., that of the person speaking. As such, the sound signal generated by the speaker may be considered a “background” audio component of the audio recording whereas another sound signal, e.g., that of the person speaking, may be considered a “foreground” audio component of the audio recording.
An audio recording may also more structurally include recordings of “background” sound signals generated by speakers. For example, in Social TV, users which are remote from each other may watch a same television program while communicating with each other via audio (e.g., via Voice-over-IP) or video (e.g., Skype, Lync, WebRTC, FaceTime), with the latter also including audio communication. This way, the users may jointly watch, discuss and comment on the television program, even being remote from each other. However, as a result, each user will typically also hear the audio of the television of the other user playing in the background.
The background audio component may be of relatively poor quality in the audio recording. There may be a number of reasons for this, including but not limited to the microphone being typically directed at the “foreground” sound source rather than the “background” sound source, i.e., the speaker generating the sound signal, the codec of the audio encoder being optimized for the foreground audio component (e.g., speech) rather than the background audio component (e.g., music), and there being an additional ‘digital-to-sound-to-digital’ conversion step, caused by the reproduction by the loudspeaker and the subsequent recording by a microphone.
It is known to remove or attenuate such a background audio component in the audio recording, for example as described in PCT/EP2015/067548.
However, although the recording of the sound signal may not be the primary intent of the audio recording, it may nevertheless be desirable to reproduce the audio content represented by the sound signal when playing-out the audio recording. Namely, by removing a background audio component, the context of the foreground audio component may be inadvertently removed as well. To nevertheless improve the quality of the recording of the sound signal in the audio recording, one could opt to increase the quality of the audio recording, e.g., by applying suitable audio processing. However, such audio processing rarely obtains sufficiently good results.