While surround multi-speaker systems are already popular in the home and desktop settings, the number of multi-channel audio recordings available is still limited. Recent movie soundtracks and some musical recordings are available in multi-channel format, but most music recordings are still mixed into two channels and playback of this material over a multi-channel system poses several questions. Sound engineers mix stereo recordings with a very particular set up in mind, which consists of a pair of loudspeakers placed symmetrically in front of the listener. Thus, listening to this kind of material over a multi-speaker system (e.g. 5.1 surround) raises the question as to what signal or signals should be sent to the surround and center channels. Unfortunately, the answer to this question depends strongly on individual preferences and no clear objective criteria exist.
There are two main approaches for mixing multi-channel audio. One is the direct/ambient approach, in which the main (e.g. instrument) signals are panned among the front channels in a frontally oriented fashion as is commonly done with stereo mixes, and “ambience” signals are sent to the rear (surround) channels. This mix creates the impression that the listener is in the audience, in front of the stage (best seat in the house). The second approach is the “in-the-band” approach, where the instrument and ambience signals are panned among all the loudspeakers, creating the impression that the listener is surrounded by the musicians. There is an ongoing debate about which approach is the best.
Whether an in-the-band or a direct/ambient approach is adopted, there is a need for better signal processing techniques to manipulate a stereo recording to extract the signals of individual instruments as well as the ambience signals. This is a very difficult task since no information about how the stereo mix was done is available in most cases.
The existing two-to-N channel up-mix algorithms can be classified in two broad classes: ambience generation techniques which attempt to extract and/or synthesize the ambience of the recording and deliver it to the surround channels (or simply enhance the natural ambience), and multichannel converters that derive additional channels for playback in situations when there are more loudspeakers than program channels. In the latter case, the goal is to increase the listening area while preserving the original stereo image. Multichannel converters can be generally categorized in the following classes:
1) Linear matrix converters, where the new signals are derived by scaling and adding/subtracting the left and right signals. Mainly used to create a 2-to-3 channel up-mix, this method inevitably introduces unwanted artifacts and preservation of the stereo image is limited.
2) Matrix steering methods which are basically dynamic linear matrix converters. These methods are capable of detecting and extracting prominent sources in the mix such as dialogue, even if they are not panned to the center. Gains are dynamically computed and used to scale the left and right channels according to a dominance criterion. Thus a source (or sources) panned in the primary direction can be extracted. However, this technique is still limited to looking at a primary direction, which in the case of music might not be unique.
While the techniques described above have been of some use, there remains a need for better signal processing techniques for multichannel conversion and developing better techniques for manipulating existing stereo recordings to be played on a multispeaker system remains an important problem.