Dolby Surround multichannel audio for personal computer-based multimedia video games and CD ROMs has emerged as a new use for the Dolby MP (Motion Picture) matrix, a 4:2:4 amplitude-phase audio matrix. The Dolby MP matrix is well known in connection with Dolby Stereo movies and Dolby Surround video recordings (video tapes and laser discs), broadcast transmissions (radio and television), and audio media (cassettes and compact discs).
An encoder embodying the Dolby MP 4:2 encode matrix combines four channels of audio into an encoded two channel format, suitable for recording or transmitting the same as regular stereo programs, while a Dolby Surround decoder embodying a Dolby MP 2:4 decode matrix recovers four channels of audio from the two encoded channels.
Dolby Surround is a true surround sound system, not just a playback effect. It involves encoding sounds during production to create a pair of Dolby Surround encoded signals (a "soundtrack"), and then decoding the soundtrack on playback using a Dolby Surround decoder. Thus, producers can control the placement and movement of sounds in a way that creates a remarkably realistic experience, drawing the listener into the action.
FIG. 1 is an idealized functional block diagram of a conventional prior art Dolby MP Matrix encoder. The encoder accepts four separate input signals; left, center, right, and surround (L, C, R, S), and creates two final outputs, left-total and right-total (Lt and Rt). The C input is divided equally and summed with the L and R inputs with a 3 dB level reduction in order to maintain constant acoustic power. The L and R inputs, each summed with the level-reduced C input, are phase shifted in respective identical all pass networks located between first and second summers in each path. The S input is also divided equally between Lt and Rt with a 3 dB level reduction, but it first undergoes three additional processing steps (which may occur in any order):
a. frequency bandlimiting from 100 Hz to 7 kHz; and PA1 b. encoding with a modified form of Dolby B-type noise reduction.
The processed S input is then applied a third all pass network, the output of which is summed with the phase-shifted L/C path to produce the Lt output and subtracted from the phase-shifted R/C path to produce the Rt output. Thus, the surround input S is fed into the Lt and Rt outputs with opposite polarities. In addition, the phase of the surround signal S is about 90 degrees with respect to the LCR inputs. It is of no significance whether the surround leads or lags the other inputs. In principle there need be only one phase-shift block, say -90 degrees, in the surround path, its output being summed with the other signal paths, one in-phase (say Lt) and the other out-of-phase (inverted) (say Rt). In practice, as shown in FIG. 1, a 90 degree phase shifter is unrealizable, so three all-pass networks are used, two identical ones in the paths between the center channel summers and the surround channel summers and a third in the surround path. The networks are designed so that the very large phase-shifts of the third one are 90 degrees more or less than those (also very large) of the first two.
The left-total (Lt) and right-total (Rt) encoded signals may be expressed a s EQU Lt=L+0.707C+0.707jS'; and EQU Rt=R+0.707C-0.707jS',
where L is the left input signal, R is the right input signal, C is the center input signal and S' is the band-limited and noise reduction encoded surround input signal S. In the above equations and in other equations in this document, a term (such as 0.707 jS') containing "j" represents a signal phase-shifted 90 degrees with respect to other terms.
Audio signals encoded by a Dolby MP matrix encoder may be decoded by a Dolby Surround decoder--a passive surround decoder, or a Dolby Pro Logic decoder--an active surround decoder. Passive decoders are limited in their ability to place sounds with precision for all listener positions due to inherent crosstalk limitations in the audio matrix. Dolby Pro Logic active decoders employ directional enhancement techniques which reduce such crosstalk components.
FIG. 2 is an idealized functional block diagram of a passive surround decoder suitable for decoding Dolby MP matrix encoded signals. The heart of the passive matrix decoding process is a simple L-R difference amplifier. Except for level and channel balance corrections, the Lt input signal passes unmodified and becomes the left output. The Rt input signal likewise becomes the right output. Lt and Rt also carry the center signal, so it will be heard as a "phantom" image between the left and right speakers, and sounds mixed anywhere across the stereo soundstage will be presented in their proper perspective. The center speaker is thus shown as optional since it is not needed to reproduce the center signal. The L-R stage in the decoder will detect the surround signal by taking the difference of Lt and Rt, then passing it through a 7 kHz low-pass filter, a delay line, and complementary modified Dolby B-type noise reduction. The surround signal will also be reproduced by the left and right speakers, but it will be heard out-of-phase which will diffuse the image. In order properly to reproduce the decoded surround sound signal, the surround signal is ordinarily reproduced by one or more surround speakers located to the sides of and/or to the rear of the listener.
Dolby Surround multichannel sound is also employed to encode the audio of many personal-computer-based multimedia video games and CD ROMs. When played on personal computers having Dolby Surround decoders and suitable loudspeakers, the computer user experiences the same sort of multichannel surround sound as he or she has known in Dolby Surround home theatre.
One important difference between the computer-based and home theatre experiences is that the former usually are interactive, requiring the real-time involvement of the user. Typically, a manual input (joystick, mouse, keyboard, etc.) initiated by the computer user causes a change in the displayed video and/or audio. In order to enhance the realism of the interactivity, it would be desirable for user actions to result not merely in the creation of additional sound effects in real time, but for such sound effects to have variable spatial positions determined in real time.
Accordingly, there is a need to spatially encode one or more sounds in real time for mixing with a pre-recorded surround-sound soundtrack (the soundtrack of a computer game, a CD ROM or Internet audio, for example). Further, there is a need to accomplish such encoding as simply as possible, using as few computing resources as possible.