Since the introduction of home electronics, efforts have been made to make entertainment systems closer to live entertainment or commercial movie theaters. Among other improvements, the number of sound channels in a single audio signal were increased to produce more enveloping and convincing sound reproduction. This trend accelerated the advent of digital signal transmission and storage, which dramatically increased available standards and options.
A standard for digital audio known as AC-3, or Dolby Digital, is used in connection with digital television and audio transmissions, as well as with digital storage media. AC-3 codes a multiplicity of channels as a single entity. More specifically, the AC-3 standard provides for delivery, from storage or broadcast, for example, six channels of audio information. Such processing provides lower data rates and thus requires smaller transmission bandwidth or storage space than direct audio digitization method or PCM (pulse code modulation).
The standard reduces the amount of data needed to reproduce high quality sound by capitalizing on how the human ear processes the sound AC3 is a lossy audio codec in the sense some unimportant audio components are allocated fewer bits or simply discarded during the encoding process for the purpose of data compression. Such audio components could be the weak audio signals located in frequency domain close to a strong or dominant audio signal since they are masked by the neighboring strong audio signal, as a result, bandwidth requirements to transmit or media space to store audio data is reduced significantly.
Five AC-3 audio channels include wideband audio information, and an additional channel embodies low frequency effects. The channels are paths within the signal that represent Left, Center, Right, Left-Surround, and Right-Surround data, as well as the limited bandwidth low-frequency effect (LFE) channel. AC-3 conveys the channel arrangement in linear pulse code modulated (PCM) audio samples. AC-3 processes an at least 18 bit signal over a frequency range from 20 Hz to 20 kHz. The LFE reproduces sound at 20 to 120 Hz.
The audio data is byte-packed into audio substream packets and is sampled at rates of 32, 44.1, or 48 kHz. The packets include a linear pulse code modulated (LPCM) block header carrying parameters (e.g. gain, number of channels, bit width of audio samples) used by an audio decoder. The block header 10 is shown in the packet 12 of FIG. 1A along with a block of audio data 14. The format of the audio data is dependent on the bit-width of the samples. FIG. 1B shows how the audio samples in the audio data block may be stored for 16-bit samples. In this example, the 16-bit samples made in a given time instant are stored as left (LW) and right (RW), followed by samples for any other channels (XW). Allowances are made for up to 8 channels, or paths within a given signal.
The multichannel nature of the AC-3 standard allows a single signal to be independently processed by various post processing algorithms used to augment and facilitate playback. Such techniques include matrixing, center channel equalization, enhanced surround sound, bass management, as well as other channel transferring techniques. Generally, matrixing achieves system and signal compatibility by electrically mixing two or more sound channels to produce one or more new ones. Because new soundtracks must play transparently on older systems, matrixing ensures that no audible data is lost in dated cinemas and home systems. Conversely, matrixing enables new audio systems to reproduce older audio signals that were recorded outside of the AC-3 standard.
Since everyone does not have the equipment needed to take advantage of AC-3 channel sound, an embodiment of matrixing known as downmixing ensures compatibility with older playback devices. Downmixing is employed when a consumer's sound system lacks the full complement of speakers available to the AC-3 format. For instance, a six channel signal must be downmixed for delivery to a stereo system having only two speakers. For proper audio reproduction in the two speaker system, a decoder must matrix mix the audio signal so that it conforms with the parameters of the dual speaker device. Similarly, should the AC-3 signal be delivered to a mono television, the audio decoder downmixes the six channel signal to a mono signal compatible with the amplifier system of the television. A decoder of the playback device executes the downmixing algorithm and allows playback of AC-3 irrespective of system limitations.
Conversely, where a two channel signal is delivered to a four or six speaker amplifier arrangement, Dolby Prologic techniques are employed to take advantage of the more capable setup. Namely, Prologic permits the extraction of four to six decoded channels from two codified digital input signals. A Prologic decoder disseminates the channels to left, right and center speakers, as well as to two additional loudspeakers incorporated for surround sound purposes. A four-channel extraction algorithm is generically illustrated in FIG. 2. Based on two digital input streams, referred to as Left_input and Right_input, four fundamental output channels are extracted. The channels are indicated in the figure as Left, Right, Central and Surround.
Prologic employs analog or digital “steering” circuitry to enhance surround effects. The steering circuitry manipulates two-channel sources and allows encoded center-channel material to be routed to a center speaker. Encoded surround material is similarly routed to the surround speakers. The goal of steering up front is to simulate three discrete-channel sources, with surround steering normally simulating a broad sense of space around the viewer. A center channel equalizer is used to drive a loudspeaker that is centrally located with respect to the listener. Most of the time, the center channel carries the conversation and the center channel equalization block provides options to emphasize the speech signal or to generate some smoothing effects.
Enhanced surround sound is a desirable post processing technique available in systems having ambient noise producing or surround loudspeakers. Such speakers are arranged behind and on either side of the listener. When decoding surround material, four channels (left/center/right/surround) are reproduced from the input signal. The surround channels enable rear localization, true 360° pans, convincing flyovers and other effects.
Bass management techniques are used to redirect low frequency signal components to speakers that are especially configured to playback bass tones. The low frequency range of the audible spectrum encompasses about 20 Hz to 120 Hz. Such techniques are necessary where damage to small speakers would otherwise result. In addition to ensuring that the low frequency content of a music program is sent to appropriate speakers, bass management allows the listener to accurately select a level of bass according to their own preferences.
Virtual Enhanced Surround (VES) and Digital Cinema Sound (DCS) are post processing methods used to further manage the surround sound component of an audio signal. Both techniques divide and sum aspects of the signal to create an illusion of three-dimensional immersion. Which method is used depends on the configuration of a consumer's speaker system. VES enhances playback when the ambient noise or surround sound portion of the signal is conveyed only in two front speakers. DCS is needed to digitally coordinate the ambient noise where rear surround speakers are used.
Finally, if a consumer prefers the privacy and freedom of movement afforded by headphones, appropriate processing techniques simulate the above effects in a headphone set, including realistic surround sound.
To achieve their respective effects, post processing circuitry must alter the audio input signal from its original format. For instance, a matrixing operation necessarily reformats an input signal by electronically mixing it with another. The process varies the number of channels in the signal, fundamentally altering the original signal. Likewise, a VES application purposely manipulates the audio signal to create the desired 3D audio image using only two front speakers. The VES processing includes digital filtering, mixing an input signal with another, and further interjects delays and attenuation. Such manipulations represent dramatic departures from the content and format of the original signal.
Latent distortions still impact subsequent processes. Because such processes begin with an altered signal, some exacerbate distorting properties introduced by a preceding technique in the course of applying their own algorithms. Such distortions are sampled, magnified and reproduced at exaggerated levels such that they influence subsequent processing and become perceptible to the listener.
For instance, executing a summing VES algorithm prior to applying a bass management technique results in a “tinny,” hollow sound. Further, following a center channel equalizer application with an enhanced surround sound algorithm can introduce filter overflow. Such overflow precipitates the clipping of audio portions from the signal. The clipped signal may sound “choppy.” disjointed and be unrepresentative of the original signal. Time delays and attenuations associated with DCS or Prologic applications can introduce noise into a post processing effort. Such noise manifests in static, granularity and other sound degradation.
Undesirable distorting effects are further compounded in playback systems that stack several post processing algorithms. In such systems, an input signal may be altered substantially before being processed by a final algorithm. The integrity of the resultant signal is compromised by clipping and noise complications. Therefore, there is a significant need for a method of coordinating multiple algorithms within a single post processing effort without sacrificing audio signal integrity.