In the last few decades, there has been an ever-increasing rise in the production, distribution and presentation of multichannel audio material. This rise has been driven significantly by the film industry in which 5.1 channel playback systems are almost ubiquitous and, more recently, by the music industry which is beginning to produce 5.1 multichannel music.
Typically, such audio material is presented through a playback system that has the same number of channels as the material. For example, a 5.1 channel film soundtrack may be presented in a 5.1 channel cinema or through a 5.1 channel home theater audio system. However, there is an increasing desire to play multichannel material over systems or in environments that do not have the same number of presentation channels as the number of channels in the audio material—for example, the playback of 5.1 channel material in a vehicle that has only two or four playback channels, or the playback of greater than 5.1 channel movie soundtracks in a cinema that is only equipped with a 5.1 channel system. In such situations, there is a need to combine or “downmix” some or all of the channels of the multichannel signal for presentation.
The combining of channels may produce audible artifacts. For example, some frequency components may cancel while other frequency components reinforce or become louder. Most commonly, this is a result of the existence of similar or correlated audio signal components in two or more of the channels that are being combined.
It is an object of this invention to minimize or suppress artifacts that occur as a result of combining channels. Other objects will be appreciated as this document is read and understood.
It should be noted that the combining of channels may be required for other purposes, not just for a reduction in the number of channels. For example, there may be a need to create an additional playback channel that is some combination of two or more of the original channels in the multichannel signal. This may be characterized as a type of “upmixing” in that the result is more than the original number of channels. Thus, whether in the context of “downmixing” or “upmixing,” the combining of channels to create an additional channel may lead to audible artifacts.
Common techniques for minimizing mixing or channel-combining artifacts involve applying, for example, one or more of time, phase, and amplitude (or power) adjustments to the channels to be combined, to the resulting combined channel, or to both. Audio signals are inherently dynamic—that is, their characteristics change over time. Therefore, such adjustments to audio signals are typically calculated and applied in a dynamic manner. While removing some artifacts resulting from combining, such dynamic processing may introduce other artifacts. To minimize such dynamic processing artifacts, the present invention employs Auditory Scene Analysis so that, in general, dynamic processing adjustments are maintained substantially constant during auditory scenes or events and changes in such adjustments are permitted only at or near auditory scene or event boundaries.