While surround multi-speaker systems are already popular in the home and desktop settings, the number of multi-channel audio recordings available is still limited. Recent movie soundtracks and some musical recordings are available in multi-channel format, but most music recordings are still mixed into two channels and playback of this material over a multi-channel system poses several questions. Sound engineers mix stereo recordings with a very particular set up in mind, which consists of a pair of loudspeakers placed symmetrically in front of the listener. Thus, listening to this kind of material over a multi-speaker system (e.g. 5.1 surround) raises the question as to what signal or signals should be sent to the surround and center channels. Unfortunately, the answer to this question depends strongly on individual preferences and no clear objective criteria exist.
There are two main approaches for mixing multi-channel audio. One is the direct/ambient approach, in which the main (e.g. instrument) signals are panned among the front channels in a frontally oriented fashion as is commonly done with stereo mixes, and “ambience” signals are sent to the rear (surround) channels. This mix creates the impression that the listener is in the audience, in front of the stage (best seat in the house). The second approach is the “in-the-band” approach, where the instrument and ambience signals are panned among all the loudspeakers, creating the impression that the listener is surrounded by the musicians. There is an ongoing debate about which approach is the best.
Whether an in-the-band or a direct/ambient approach is adopted, there is a need for better signal processing techniques to manipulate a stereo recording to extract the signals of ambience signals as well as the individual instruments. This is a very difficult task since no information about how the stereo mix was done is available in most cases.
The existing two-to-N channel up-mix algorithms can be classified in two broad classes: ambience generation techniques which attempt to extract and/or synthesize the ambience of the recording and deliver it to the surround channels (or simply enhance the natural ambience), and multichannel converters that derive additional channels for playback in situations when there are more loudspeakers than program channels. The ambience generation methods generally rely on combinations of the following methods:
1) Applying artificial reverberation to the stereo signal. The resulting impression is essentially of listening to the original recording in a virtual listening room. This artificial ambience information does not match the conditions in which the original recording was produced.
2) Computing the difference of the original left and right signals. This provides a monaural signal whose content includes the desired ambience information and excludes any primary signal panned in the center of the original stereo image. However, the resulting ambience signal also contains unwanted leakage from any primary signals not panned to the center. This leakage can be partially reduced by use of logic steering techniques.
3) Deriving a stereo ambience signal from a mono signal (pseudostereophony). Two weakly correlated signals can be obtained by applying a pair of all-pass filters to a single audio signal.
4) Applying a small delay (typically 5 to 20 ms) on the rear-channel signals to alleviate unwanted localization artifacts caused by any leakage of primary signals into the rear channels. This is an effective method for better preserving the frontal stereo image of the original recording, but it cannot correct the ambience information itself.
5) Deriving room responses corresponding to virtual microphone positions so as to synthesize reverberation signals that match the acoustics of the original venue. However, the application of this method is in principle restricted to live recordings for which detailed additional historical information is available on the original recording conditions and techniques. Also the method cannot reproduce other ambience components due to background noise in the original recording.
While the techniques described above have been of some use, there remains a need for better signal processing techniques for separating ambience for surround channels and developing better techniques for manipulating existing stereo recordings to be played on a multispeaker system remains an important problem.