1. Field of the Invention
The invention relates to methods and systems for applying reverb to a multi-channel downmixed audio signal indicative of a larger number of individual audio channels. In some embodiments, this is done by upmixing the input signal and applying reverb to at least some of its individual channels in response to at least one spatial cue parameter (indicative of least one spatial cue for the input signal) so as to apply different reverb impulse responses for each of the individual channels to which reverb is applied. Optionally, after application of reverb the individual channels are downmixed to generate an N-channel reverbed output signal. In some embodiments the input signal is a QMF (quadrature mirror filter) domain MPEG Surround (MPS) encoded signal, and the upmixing and reverb application are performed in the QMF domain in response to MPS spatial cue parameters including at least some of Channel Level Difference (CLD), Channel Prediction Coefficient (CPC), and Inter-channel Cross Correlation (ICC) parameters.
2. Background of the Invention
Throughout this disclosure including in the claims, the expression “reverberator” (or “reverberator system”) is used to denote a system configured to apply reverb to an audio signal (e.g., to all or some channels of a multi-channel audio signal).
Throughout this disclosure including in the claims, the expression “system” is used in a broad sense to denote a device, system, or subsystem. For example, a subsystem that implements a reverberator may be referred to as a reverberator system (or reverberator), and a system including such a reverberator subsystem (e.g., a decoder system that generates X+Y output signals in response to Q+R inputs, in which the reverberator subsystem generates X of the outputs in response to Q of the inputs and the other outputs are generated in another subsystem of the decoder system) may also be referred to as a reverberator system (or reverberator).
Throughout this disclosure including in the claims, the expression “reproduction” of signals by speakers denotes causing the speakers to produce sound in response to the signals, including by performing any required amplification and/or other processing of the signals.
Throughout this disclosure including in the claims, the expression “linear combination” of values v1, v2, . . . , vn, (e.g., n elements of a subset of a set of X individual audio channel signals occurring at a time, t, where n is less than or equal to X) denotes a value equal to a1v1+a2v2+ . . . +anvn, where a1, a2, . . . , an are coefficients. In general, there is no restriction on the values of the coefficients (e.g., each coefficient can be positive or negative or zero). The expression is used in a broad sense herein, for example to cover the case that one of the coefficients is equal to 1 and the others are equal to zero (e.g., the case that the linear combination a1v1+a2v2+ . . . +anvn is equal to v1 (or v2, . . . , or vn).
Throughout this disclosure including in the claims, the expression “spatial cue parameter” of a multichannel audio signal denotes any parameter indicative of at least one spatial cue for the audio signal, where each such “spatial cue” is indicative (e.g., descriptive) of the spatial image of the multichannel signal. Examples of spatial cues are level (or intensity) differences between (or ratios of) pairs of the channels of the audio signal, phase differences between such channel pairs, and measures of correlation between such channel pairs. Examples of spatial cue parameters are the Channel Level Difference (CLD) parameters and Channel Prediction Coefficient (CPC) parameters which are part of a conventional MPEG Surround (“MPS”) bitstream, and which are employed in MPEG surround coding.
In accordance with the well known MPEG Surround (“MPS”) standard, multiple channels of audio data can be encoded by being downmixed into a smaller number of channels (e.g., M channels, where M is typically equal to 2) and compressed, and such an M-channel downmixed audio signal can be decoded by being decompressed and processed (upmixed) to generate N decoded audio channels (e.g., M=2 and N=5).
A typical, conventional MPS decoder is operable to perform upmixing to generate N decoded audio channels (where N is greater than two) in response to a time-domain, 2-channel, downmixed audio input signal (and MPS spatial cue parameters including Channel Level Difference and Channel Prediction Coefficient parameters). A typical, conventional MPS decoder is operable in a binaural mode to generate a binaural signal in response to a time-domain, 2-channel, downmixed audio input signal and spatial cue parameters, and in at least one other mode to perform upmixing to generate 5.0 (where the notation “x.y” channels denotes “x” full frequency channels and “y” subwoofer channels), 5.1, 7.0, or 7.1 decoded audio channels in response to a time-domain, 2-channel, downmixed audio input signal and spatial cue parameters. The input signal undergoes time domain-to-frequency domain transformation into the QMF (quadrature mirror filter) domain, to generate two channels of QMF domain frequency components. These frequency components undergo decoding in the QMF domain and the resulting frequency components are typically then transformed back into the time domain to generate the audio output of the decoder.
FIG. 1 is a simplified block diagram of elements of a conventional MPS decoder configured to generate N decoded audio channels (where N is greater than two, and N is typically equal to 5 or 7) in response to a 2-channel downmixed audio signal (L′ and R′) and MPS spatial cue parameters (including Channel Level Difference parameters and Channel Prediction Coefficient parameters). The downmixed input signal (L′ and R′) is indicative of “X” individual audio channels, where X is greater than 2. The downmixed input signal is typically indicative of five individual channels (e.g., left-front, right-front, center, left-surround, and right-surround channels).
Each of the “left” input signal L′ and the “right” input signal R′ is a sequence of QMF domain frequency components generated by transforming a 2-channel, time-domain MPS encoded signal (not indicated in FIG. 1) in a time domain-to-QMF domain transform stage (not shown in FIG. 1).
The downmixed input signals L′ and R′ are decoded into N individual channel signals S1, S2, . . . , SN, in decoder 1 of FIG. 1, in response to the MPS spatial cue parameters which are asserted (with the input signals) to the FIG. 1 system. The N sequences of output QMF domain frequency components, S1, S2, . . . , SN are typically transformed back into the time domain by a QMF domain-to-time domain transform stage (not shown in FIG. 1), and can be asserted as output from the system without undergoing post-processing. Optionally, the signals S1, S2, . . . , SN undergo post-processing (in the QMF domain) in post-processor 5 to generate an N-channel audio output signal comprising channels OUT1, OUT2, . . . , OUTN. The N sequences of output QMF domain frequency components, OUT1, OUT2, . . . , OUTN, are typically transformed back into the time domain by a QMF domain-to-time domain transform stage (not shown in FIG. 1), and asserted as output from the system.
The conventional MPS decoder of FIG. 1 operating in a binaural mode generates 2-channel binaural audio output S1 and S2, and optionally also 2-channel binaural audio output OUT1 and OUT2, in response to a 2-channel downmixed audio signal (L′ and R′) and MPS spatial cue parameters (including Channel Level Difference parameters and Channel Prediction Coefficient parameters). When reproduced by a pair of headphones, the 2-channel audio output S1 and S2 is perceived at the listener's eardrums as sound from “X” loudspeakers (where X>2 and X is typically equal to 5 or 7) at any of a wide variety of positions (determined by the coefficients of decoder 1), including positions in front of and behind the listener. In the binaural mode, post-processor 5 can apply reverb to the 2-channel output (S1, S2) of decoder 1 (in this case, post-processor 5 implements an artificial reverberator). The FIG. 1 system could be implemented (in a manner to be described below) so that the 2-channel output of post-processor 5 (OUT1 and OUT2) is a binaural audio output to which reverb has been applied, and which when reproduced by headphones is perceived at the listener's eardrums as sound from “X” loudspeakers (where X>2 and X is typically equal to 5) at any of a wide variety of positions, including positions in front of and behind the listener.
Reproduction of signals S1 and S2 (or OUT1 and OUT2) generated during binaural mode operation of the FIG. 1 decoder can give the listener the experience of sound that comes from more than two (e.g., five) “surround” sources. At least some of these sources are virtual. More generally, it is conventional for virtual surround systems to use head-related transfer functions (HRTFs) to generate audio signals (sometimes referred to as virtual surround sound signals) that, when reproduced by a pair of physical speakers (e.g., loudspeakers positioned in front of a listener, or headphones) are perceived at the listener's eardrums as sound from more than two sources (e.g., speakers) at any of a wide variety of positions (typically including positions behind the listener).
As noted, the MPS decoder of FIG. 1 operating in the binaural mode could be implemented to apply reverb using an artificial reverberator implemented by post-processor 5. This reverberator could be configured to generate reverb in response to the two-channel output (S1, S2) of decoder 1 and to apply the reverb to the signals S1 and S2 to generate reverbed two-channel audio OUT1 and OUT2. The reverb would be applied as a post process stereo-to-stereo reverb to the 2-channel signal S1, S2 from decoder 1, such that the same reverb impulse response is applied to all discrete channels determined by one of the two downmixed audio channels of the binaural audio output of decoder 1 (e.g., to left-front and left-surround channels determined by downmixed channel S1), and the same reverb impulse response is applied to all discrete channels determined by the other one of the two downmixed audio channels of the binaural audio (e.g., to right-front and right-surround channels determined by downmixed channel S2).
One type of conventional reverberator has what is known as a Feedback Delay Network-based (FDN-based) structure. In operation, such a reverberator applies reverb to a signal by feeding back to the signal a delayed version of the signal. An advantage of this structure relative to other reverb structures is the ability to efficiently produce and apply multiple uncorrelated reverb signals to multiple input signals. This feature is exploited in the commercially available Dolby Mobile headphone virtualizer which includes a reverberator having FDN-based structure and is operable to apply reverb to each channel of a five-channel audio signal (having left-front, right-front, center, left-surround, and right-surround channels) and to filter each reverbed channel using a different filter pair of a set of five head related transfer function (“HRTF”) filter pairs. This virtualizer generates a unique reverb impulse response for each audio channel.
The Dolby Mobile headphone virtualizer is also operable in response to a two-channel audio input signal, to generate a two-channel “reverbed” audio output (a two-channel virtual surround sound output to which reverb has been applied). When the reverbed audio output is reproduced by a pair of headphones, it is perceived at the listener's eardrums as HRTF-filtered, reverbed sound from five loudspeakers at left front, right front, center, left rear (surround), and right rear (surround) positions. The virtualizer upmixes a downmixed two-channel audio input (without using any spatial cue parameter received with the audio input) to generate five upmixed audio channels, applies reverb to the upmixed channels, and downmixes the five reverbed channel signals to generate the two-channel reverbed output of the virtualizer. The reverb for each upmixed channel is filtered in a different pair of HRTF filters.
US Patent Application Publication No. 2008/0071549 A1, published on Mar. 20, 2008, describes another conventional system for applying a form of reverb to a downmixed audio input signal during decoding of the downmixed signal to generate individual channel signals. This reference describes a decoder which transforms time-domain downmixed audio input into the QMF domain, applies a form of reverb to the downmixed signal M(t,f) in the QMF domain, adjusts the phase of the reverb to generate a reverb parameter for each upmix channel being determined from the downmixed signal (e.g., to generate reverb parameter Lreverb(t, f) for an upmix left channel, and reverb parameter Rreverb(t, f) for an upmix right channel, being determined from the downmixed signal M(t,f)). The downmixed signal is received with spatial cue parameters (e.g., an ICC parameter indicative of correlation between left and right components of the downmixed signal, and inter-channel phase difference parameters IPDL and IPDR). The spatial cue parameters are used to generate the reverb parameters (e.g., Lreverb(t, f) and Rreverb(t, f)). Reverb of lower magnitude is generated from the downmixed signal M(t,f) when the ICC cue indicates that there is more correlation between left and right channel components of the downmixed signal, reverb of greater magnitude is generated from the downmixed signal when the ICC cue indicates that there is less correlation between the left and right channel components of the downmixed signal, and apparently the phase of each reverb parameter is adjusted (in block 206 or 208) in response to the phase indicated by the relevant IPD cue. However, the reverb is used only as a decorrelator in a parametric stereo decoder (mono-to-stereo synthesis) where the decorrelated signal (which is orthogonal to M(t,f)) is used to reconstruct the left-right cross correlation, and the reference does not suggest individually determining (or generating) a different reverb signal, for application to each of discrete channels of an upmix determined from the downmixed audio M(t,f) or to each of a set of linear combinations of values of individual upmix channels determined from the downmixed audio, from each of the discrete channels of the upmix or each of such linear combinations.
The inventor has recognized that it would be desirable to individually determine (and generate) a different reverb signal for each of the discrete channels of an upmix determined from downmixed audio, from each of the discrete channels of the upmix, or to determine and generate a different reverb signal for (and from) each of a set of linear combinations of values of such discrete channels. The inventor has also recognized that with such individual determination of reverb signals for the individual upmix channels (or linear combinations of values of such channels), reverb having a different reverb impulse response can be applied to the upmix channels (or linear combinations).
Until the present invention, spatial cue parameters received with downmixed audio had not been used both to generate discrete, upmix channels from the downmixed audio (e.g., in the QMF domain when the downmixed audio is MPS encoded audio) or linear combinations of values thereof, and to generate reverb from each such upmix channel (or linear combination) individually for application to said upmix channel (or linear combination). Nor had reverbed upmix channels that had been generated in this way been recombined to generate reverbed, downmixed audio from input downmixed audio.