Many audio reproduction systems are capable of recording, transmitting, and playing back synchronous multi-channel audio, sometimes referred to as “surround sound.” Though entertainment audio began with simplistic monophonic systems, it soon developed two-channel (stereo) and higher channel-count formats (surround sound) in an effort to capture a convincing spatial image and sense of listener immersion. Surround sound is a technique for enhancing reproduction of an audio signal by using more than two audio channels. Content is delivered over multiple discrete audio channels and reproduced using an array of loudspeakers (or speakers). The additional audio channels, or “surround channels,” provide a listener with an immersive listening experience.
Surround sound systems typically have speakers positioned around the listener to give the listener a sense of sound localization and envelopment. Many surround sound systems having only a few channels (such as a 5.1 format) have speakers positioned in specific locations in a 360-degree arc about the listener. These speakers also are arranged such that all of the speakers are in the same plane as each other and the listener's ears. Many higher-channel count surround sound systems (such as 7.1, 11.1, and so forth) also include height or elevation speakers that are positioned above the plane of the listener's ears to give the audio content a sense of height. Often these surround sound configurations include a discrete low-frequency effects (LFE) channel that provides additional low-frequency bass audio to supplement the bass audio in the other main audio channels. Because this LFE channel requires only a portion of the bandwidth of the other audio channels, it is designated as the “.X” channel, where X is any positive integer including zero (such as in 5.1 or 7.1 surround sound).
Ideally surround sound audio is mixed into discrete channels and those channels are kept discrete through playback to the listener. In reality, however, storage and transmission limitations dictate that the file size of the surround sound audio be reduced to minimize storage space and transmission bandwidth. Moreover, two-channel audio content is typically compatible with a larger variety of broadcasting and reproduction systems as compared to audio content having more than two channels.
Matrixing was developed to address these needs. Matrixing involves “downmixing” an original signal having more than two discrete audio channels into a two-channel audio signal. The additional channels over two channels are downmixed according to a pre-determined process to generate a two-channel downmix that includes information from all of the audio channels. The additional audio channels may later be extracted and synthesized from the two-channel downmix using an “upmix” process such that the original channel mix can be recovered to some level of approximation. Upmixing receives the two-channel audio signal as input and generates a larger number of channels for playback. This playback is an acceptable approximation of the discrete audio channels of the original signal.
Several upmixing techniques use constant-power panning. The concept of “panning” is derived from motion pictures and specifically the word “panorama.” Panorama means to have a complete visual view of a given area in every direction. In the audio realm, audio can be panned in the stereo field so that the audio is perceived as being positioned in physical space such that all the sounds in a performance are heard by a listener in their proper location and dimension. For musical recordings, a common practice is to place the musical instruments where they would be physically located on a real stage. For example, stage-left instruments are panned left and stage-right instruments are panned right. This idea seeks to replicate a real-life performance for the listener during playback.
Constant-power panning maintains constant signal power across audio channels as the input audio signal is distributed among them. Although constant-power panning is widespread, current downmixing and upmixing techniques struggle to preserve and recover the precise panning behavior and localization present in an original mix. In addition, some techniques are prone to artifacts, and all have limited ability to separate independent signals that overlap in time and frequency but originate from different spatial directions.
For example, some popular upmixing techniques use voltage-controlled amplifiers to normalize both input channels to approximately the same level. These two signals then are combined in an ad-hoc manner to produce the output channels. Due to this ad-hoc approach, however, the final output has difficulty achieving desired panning behaviors and includes problems with crosstalk and at best approximates discrete surround-sound audio.
Other types of upmixing techniques are precise only in a few panning locations but are imprecise away from those locations. By way of example, some upmixing techniques define a limited number of panning locations where upmixing results in precise and predictable behavior. Dominance vector analysis is used to interpolate between a limited number of pre-defined sets of dematrixing coefficients at the precise panning location points. Any panning location falling between the points use interpolation to find the dematrixing coefficient values. Due to this interpolation, panning locations falling between the precise points can be imprecise and adversely affect audio quality.