1. Field of the Invention
The present invention pertains to audio signal processing, and more specifically, to a method and apparatus for surround sound panning.
2. Description of the Related Art
Surround sound audio (wherein, for example, sound is generated for one or more listeners 105, 106 using multiple speakers i 100-104, each respectively positioned at angle xcfx86(i) from listener 105 (positioned at a xe2x80x9csweet spotxe2x80x9d), as illustrated in FIG. 1) is growing rapidly due to the proliferation of home theaters, digital television, surround sound music, and computer games. The roots of surround sound audio are in the motion picture industry. It has been employed by movie soundtracks to locate sounds, creating a captivating environment for the theater patron. Typical theaters have three speakers in the front which provide stereo along with a center channel for dialog, and two speakers in the rear for special effects and ambient sounds. In recent years this technology has made its way to the home, fueling a rapidly growing surround sound home theater market. Dolby ProLogic has been used to enhance television shows by creating a surround sound effect. Technologies such as DVD are bringing advanced multi-channel digital audio into the home, providing an audio experience rivaling or exceeding that found in movie theaters.
In addition to DVD, surround sound is being integrated into personal computers and many new consumer media delivery systems. Among these are High Definition Television and the new digital television standard. This new technology will replace the older Dolby ProLogic surround technology. Soon all TV shows, sporting events, and commercials will be broadcast in surround sound. In addition, surround sound is currently available on most videotapes and laserdiscs.
Another area in which surround sound is emerging is recorded music. Currently, Digital Theater Systems (DTS) markets a CD-based technology that provides a high-quality six-channel audio technology for the home. Currently, industry standards committees are in the final stages of defining an audio-only DVD format. Initial music industry response to this technology has been extremely favorable.
Following is a list of current listening formats for surround sound:
5.1: Six-channel format popular in home theaters and movie theaters having left, center, and right speakers positioned in front of the listener, and left and right surround speakers behind the listener (see FIG. 2A).
7.1: Motion picture format having five full-range screen channels, two surround channels and one LFE channel. Also a consumer format with additional side or front channels (see FIG. 2B).
LCRS: Four-channel format having a single rear surround channel, often sent simultaneously to left and right surround speakers placed behind the listener (see FIG. 2C). Following is a list of current encoding formats for surround sound:
Discrete Multichannel: A system wherein audio channels are separately recorded, stored and played back.
Dolby Digital (AC-3): A digital encoding format for up to 5.1-channel audio using lossy data compression. Used in motion picture theatres and consumer audio and video equipment. Standard for DTV (digital television); used on most DVDs and many laserdiscs.
DTS: Refers to digital encoding formats from Digital Theater Systems. Used in motion picture theaters for up to eight (usually 5.1) channels, for discrete 5.1-channel music on CDs, and optional for video soundtracks on DVDs and laserdiscs.
Sony Dynamic Digital Sound (SDDS): A 7.1-channel format used in motion picture theaters.
Dolby Surround: A format used to encode LCRS audio for two-channel media, used in some television broadcasts, analog optical motion picture soundtracks, and VHS tapes; decoded using Dolby ProLogic.
Meridian Lossless Packing (MLP): A lossless data compression technique planned for use on the upcoming DVD-audio format.
One of the important aspects of creating surround sound is panning. That is, when creating surround sound, a source sound signal is xe2x80x9cpannedxe2x80x9dto each of the separate discrete channels so as to add spatial characteristics such as direction to the sound. Low-frequency effects are mixed to a separate so-called LFE channel. The LFE channel carries non-essential effects enhancement, such as the low-frequency component of an explosion.
When surround sound was initially introduced, all dialog was mapped to the center channel, stereo was mapped to left and right channels, and ambient sounds were mapped to the surround (rear) channels. Recently, however, all channels are used to locate certain sounds via panning, which is particularly useful for sound sources such as explosions or moving vehicles.
The concept of panning will now be introduced with reference to FIGS. 3, 4A, and 4B. First, FIG. 3 illustrates the head-related transfer function (hrtf) h(t,xcfx86), consisting of right ear and left ear components hL(t,xcfx86) (304) and hR(t,xcfx86) (305). Specifically, a source sound c(t) originating from speaker 300, located at an arrival angle xcfx86 from listener 301 will cause the listener to hear a sound in the left and right ears as signals l(t) (307) and r(t) (308) respectively, and in turn perceive the sound to be arriving from direction xcfx86. The left and right listener ear signals l(t) and r(t) thus can be determined as:
xe2x80x83l(t)=hL(t,xcfx86)*c(t)xe2x80x83xe2x80x83(Eq. 1)
r(t)=hR(t,xcfx86)*c(t)xe2x80x83xe2x80x83(Eq. 2)
(where * represents a convolution operator)
FIG. 4A and FIG. 4B introduce the concept of panning with respect to stereo signals. As shown in FIG. 4A, a signal s(t) is applied to left and right speakers 409 and 411, respectively, via amplifiers 405 and 406. The left and right speakers are positioned from listener""s 416 left ear by arrival angle xcfx861, and right ear by arrival angle xcfx86r. Amplifiers 405 and 406 respectively provide a gain determined by panning weights xcex31 (xcex1) (403) and xcex3r (xcex1) (404) (where xcex1 is between 0 and 1).
FIG. 4B illustrates how a panning law is applied to determine how weights are applied to different speakers. As shown in FIG. 4B, a panning parameter xcex1 (representing, for example, a xe2x80x9cfadexe2x80x9dvalue between the left and right channels) is input to the panning law 417 to produce respective panning weights xcex31 (xcex1) and xcex3r (xcex1), shown as array 418. An application of one example of a panning law is where:
xcex3l(xcex1)=xcex1xe2x80x83xe2x80x83(Eq.3)
xcex3r(xcex1)=1xe2x88x92xcex1(Eq.4)
When such a panning law is applied to the arrangement shown in FIG. 4A, the stereo speaker-to-ear impulse response (for each ear) of a panned source 410, hp(t), can be described as:
hp(t)=xcex31h(t,xcfx861)+xcex3rh(t,xcfx86r)xe2x80x83xe2x80x83(Eq.5)
hp(t)=xcex1h(t,xcfx86l)+(1xe2x88x92xcex1)h(t,xcfx86r)xe2x80x83xe2x80x83(Eq.6)
It turns out that the speaker-to-ear impulse response of an actual sound source at direction xcfx86a (where xcfx86a=xcex1xc3x97xcfx86l+(1xe2x88x92xcex1)xc3x97xcfx86r), approximates the panned impulse response for closely spaced speakers, that is
hp(t)≈h(t,xcfx86a)xe2x80x83xe2x80x83(Eq.7)
and, as a result, panning between speakers has the perceptual effect of a single speaker positioned at xcfx86a. 
FIGS. 5A and 5B further illustrate how the above panning concepts are applied to surround sound systems. As shown in FIG. 5A, a source sound signal s(t) is applied to a set of speakers i=1 to N via respective amplifiers 501 . . . 503. Each amplifier i applies a gain determined by respective panning weights xcex3i(xcex7) so as to produce separate channel signals ci(t), where ci(t) is defined as:
ci(t)=xcex3i(xcex7)s(t)xe2x80x83xe2x80x83(Eq.8)
As shown in FIG. 5B, each respective panning weight xcex3i(xcex7) (512) is determined by panning law 511, which yields each panning weight as a function of panning parameters xcex7 and speaker location xcfx86i. 
FIG. 6 introduces how conventional surround sound panning techniques are applied for controlling the front/back and left/right panning variables of speakers 600-604. In the example provided herein, a conventional 5.1 surround sound format as described above is presented. Conventionally, the soundfield of the surround sound system is represented by a Cartesian grid 609 defined between speakers 600-604. The indicator 610 represents the position of a sound source as it is intended to be perceived by a listener centrally positioned within the grid 609 defined by the surround sound speakers as a result of the application of the sound source through the five speaker channels of the surround sound system. As will be described in more detail below, panning techniques are used to adjust the relative strength of the source sound signal as a function of the position of indicator 610.
FIGS. 7A-7D illustrate how panning concepts are conventionally applied to the conventional 5.1 surround sound format. As shown in FIG. 7A, panning weight xcex3c(x,y) (703) is determined by panning law 702, which yields the panning weight as a function of x, y, and xcex7c. When x has a value of 0, this corresponds to the position of indicator 610 being on the left edge of grid 609, and x is 1 when the position of indicator 610 is on the right edge of grid 609. Similarly, when y has a value of 0, this corresponds to the position of indicator 610 being on the back edge of grid 609, and y is 1 when the position of indicator 610 is on the front edge of grid 609.
Next, FIGS. 7B and 7C illustrate graphs having xcexi(x) on the vertical axis and x on the horizontal axis. In FIG. 7B, line 710 represents the x-direction panning law function for rear left speaker 603. As shown, it has a linear slope having a negative value 1 intersecting the horizontal axis at x=1. Conversely, line 711 representing the x-direction panning law function of the rear right speaker 604 has a linear slope of positive value of 1 intersecting the horizontal axis at x=0.
In FIG. 7C, the line 712 representing the x-direction panning law function of the left front speaker 600 also has a linear slope having a negative value of 2 intersecting the horizontal axis at x=0.5, while line 714 representing the x-direction panning law function of the right front speaker 602 has a linear slope of a positive value of 2 intersecting the horizontal axis at x=0.5. Furthermore, the line 713 representing the x-direction panning law function of center front speaker 601 has a positive linear slope of 2 from x=0 to x=0.5 intersecting the horizontal axis at x =0, and then a negative linear slope of 2 from x =0.5 to x =1.
FIG. 7D illustrates a graph having "ugr"i (y) on the vertical axis and y on the horizontal axis. As described earlier, y represents the front/back position of the indicator 610 in the grid 609. The line 715 representing the y-direction panning law function of all three front speakers 600, 601, 602 is a linear slope having a negative value 1 intersecting the horizontal axis at y=1. Conversely, line 716 representing the y-direction panning law function of the two right speakers 603, 604 is a linear slope having a positive value of 1 intersecting the horizontal axis at y=0.
Combining the equation and graphs of FIGS. 7A-7D, the following relationship is formed, where xcex3ci is the panning weight for speaker i and x and y represent the front/back and left/right position, respectively, of the joystick.
xcex3ci(x, y)=xcexi(x)"ugr"i(y)xe2x80x83xe2x80x83(Eq.9)
Although the conventional surround panning system and method described above is widely used, problems remain. For example, one such problem relates to divergence. Sound tends to accumulate in the center channel of a surround sound system. When excess energy is channeled to the center without controlling divergence, the surround sound quality is less than optimal. Conventionally, divergence is controlled by merely distributing a portion of the energy in the center channel among the front channels (i.e., the L, C and R channels in a 5.1 system). However, this is not effective in all situations.
Moreover, and on a related note, recent years have seen a revolution in the way audio is recorded, produced and mastered. Computers have radically changed the way in which people produce audio, as well as the nature of the audio processing systems upon which they depend. Digital technology has made it possible for small studios and even individuals to produce high-quality recordings without exorbitant investments in equipment. This has fueled a rapidly growing marketplace for audio-related hardware and software. Individuals and small studios now have within their reach high-quality, sophisticated equipment which was historically the sole domain of large studios. Traditionally, to be able to create professional quality recordings, one needed expensive large recording consoles as well as high-cost tape machines and other equipment. Through digital technology, the digital audio workstation (DAW) has emerged, combining recording, mixing, and mastering into a single or several software packages running on a standard personal computer using one or more digital audio soundcards. The price of these DAWs can range from about $4000 to $30,000. These low-cost, high-quality recording solutions have created a rapidly growing market.
Currently, the availability of surround sound production tools lags behind that of other audio production technology. At present, most surround sound is recorded and mixed on expensive large consoles costing upwards of several hundred thousand dollars. The increasing amount of material recorded in surround sound has created a demand for lower cost digital audio workstations which have multi-channel (surround sound) output capability. Despite the existence of numerous high-quality computer-based sound cards capable of being used for surround sound production, surround sound processing software is not readily available.
A growing segment in the DAW market is plug-in effects processing technology. In traditional settings, studios are equipped with mixing consoles with which the recording engineer controls and manipulates sound. Additionally, the recording engineer will make use of so-called xe2x80x9coutboardxe2x80x9d equipment which is used to process or alter the recorded sound. Recording engineers will use cables to patch the desired piece of equipment into the appropriate place on the recording console. In the world of the DAWs, the same paradigm holds, with individual software components replacing the outboard equipment. In this way, one company can produce a piece of software which functions as the mixing console, while a third party can produce the software which replaces outboard equipment such as equalizers and reverberators. When software that functions as outboard equipment is xe2x80x9cplugged inxe2x80x9d to the processing chain, it is said to be a piece of xe2x80x9cplug-inxe2x80x9d technology. This is much the same situation as Microsoft producing MS-Word, with third parties producing macros and templates which are purchased separately, but function in the context of MS-Word.
Currently, one of the most widely used audio production platforms is Pro Tools from Digidesign of Palo Alto, Calif. This DAW system has gained widespread acceptance among audio production professionals and currently has a base of about 25,000 users.
An example of a conventional plug-in application for Pro Tools that implements conventional surround sound panning techniques is Dolby Surround Tools.
With reference to FIG. 6, Surround Tools displays an interface including the grid 609 and indicator 610 is typically moved about the interface 609 using a joystick (not shown) in the x-y directions. Alternatively, slideable controls 606, 608 can be used to move the indicator 610 in the x and y directions, respectively.
The problems with conventional surround sound panning techniques and conventional means and interfaces for controlling surround sound panning will now be described.
Importantly, the conventional surround sound panning techniques do not accurately convey the psychoacoustics of surround sound. Accordingly, there remains a need in the art for a surround sound panning technique that more accurately conveys the psychoacoustics of surround sound.
There are other drawbacks to the traditional panning techniques described above. For example, conventional panning methods are not believed to be easily adjustable to different speaker configurations and do not adapt well to different speaker arrays.
Additionally, in the conventional interface for controlling surround sound panning such as Surround Tools, the amount of screen space available to the interface will determine the amount of precision of control of the panning weights. Accordingly the amount of screen space needed to precisely control the sounds from the speakers can be exorbitant.
Accordingly, an object of the present invention is to provide a surround sound panning method and apparatus that overcomes the disadvantages of the prior art.
Another object of the present invention is to provide a surround sound panning method and apparatus that accurately conveys the psychoacoustics of surround sound.
Another object of the present invention is to provide a surround sound panning method and apparatus that can be implemented in a conventional DAW audio production environment.
Another object of the present invention is to provide a surround sound panning method and apparatus that has an interface that allows independent adjustment of sound position and spatial extent.
Another object of the present invention is to provide a surround sound panning method and apparatus that provides snap points that instantly moves a joystick to speaker locations.
Another object of the present invention is to provide a surround sound panning method and apparatus that provides flexible panning modes that allow any channel to be selected or disabled (e.g., disable center channel for 4.0 mix).
Another object of the present invention is to provide a surround sound panning method and apparatus in which multiple tracks may be linked and panned with a single control.
The present invention achieves these objects and others by introducing a novel surround sound panning paradigm. Rather than controlling the x-y position within a linear grid, the invention characterizes the sound by specifying an azimuth and width, which parameters are used in a novel panning law to control each output channel. In a preferred implementation, the panning control is provided in a Plug-In application for a conventional DAW environment such as Pro Tools, which application includes an interface that provides precise control over the direction and spatial extent of audio.