Efficient memory management is essential to any commercial products performing complex data processing. For instance, entertainment systems rely on memory allocation for audio processing applications. Sound system designers employ such applications to make entertainment systems closer to live entertainment or commercial movie theaters. An initial step toward producing more enveloping and convincing sound was accomplished by increasing the number of sound channels encoded in a single low-rate bitstream. This trend accelerated the advent of data compression techniques that reduce transmission channel bandwidth and the storage space. While encoder design has aimed to result a simpler decoder, the increasing CPU and memory usage is still inevitable for the low bit-rate codecs. Since the memory usage is directly related to the cost of the commercial products, careful management of memory usage is crucial.
One such codec for digital audio, known as AC-3, is used in connection with digital television and audio transmissions, as well as with digital storage media. AC-3 encodes a multiplicity of channels as a single bitstream. More specifically, the AC-3 standard provides for the storage or broadcast of as many as eight channels of audio information.
The standard reduces the amount of data bits required to reproduce high quality audio by capitalizing on how the human ear processes sound. A psycho-acoustic model is utilized in the bit allocation process such that more important audio components get more bits while less perceivable audio components get less not no bits at all. For example, the unimportant audio frequency components can be those located in the frequency domain close to strong audio signals and their contribution to human's perception is masked by their neighbors. This psycho-acoustic model plays a very important role in audio data compression.
Five AC-3 audio channels include wideband audio information, and an additional channel embodies low frequency effects. The channels are paths within the signal that represent Left, Center, Right, Left-Surround, and Right-Surround data, as well as the limited bandwidth low-frequency effect (LFE) channel. AC-3 conveys the channel arrangement in linear pulse code modulated (PCM) audio samples. AC-3 processes an 18 to 22 bit signal over a frequency range from 20 Hz to 20 kHz. The LFE reproduces sound at 20 to 120 Hz.
The audio data is byte-packed into audio substream packets and sampled at rates of 32, 44.1, or 48 kHz. The packets include a linear pulse code modulated (LPCM) block header carrying parameters (e.g. gain, number of channels, bit width, bit rate, compression information, as well as video coordination and frequency identification) used by an audio decoder. Select header blocks include presentation time stamp (PTS) values that indicate the decoding time for an audio frame. The time stamp value is a time reference to a system time clock that was running during the creation or recording of the audio and video data. A similar system time clock is also running during the playback of the audio and video data and the PTS can be used for the synchronization of video and audio presentations.
Following the header block, the audio data packet contains any number of audio frames. The block header 10 is shown in the packet 12 of FIG. 1A along with a block of audio data 14. The format of the audio data is dependent on the bit-width of the frames. FIG. 1B shows how the audio frames in the audio data block may be stored for 16-bit samples. In this example, the 16-bit samples made in a given time instant are stored as left (LW) and right (RW), followed by samples for any other channels (XW). Allowances are made for up to 8 channels, or paths within a given signal.
During the decoding of the audio data, audio samples must normally be decompressed, reconstructed and enhanced in a manner consistent with the source of program material and the capabilities of the sound reproduction system. In some applications, audio data packets may contain up to six channels of raw audio data. Depending on the number of channels the sound reproduction system can reproduce, the system selectively uses the channels of raw audio data to provide a number of channels of audio that may be then stored in an audio first-in, first-out (FIFO) memory. A host, or suitable microprocessor, may read the header block before determining which frames to buffer immediately.
In addition to providing low bit-rate, the multichannel nature of the AC-3 standard allows a single signal to be independently processed by various post-processing algorithms. The post-processes, in turn, augment and facilitate playback. Such techniques include matrixing, center channel equalizing, enhanced surround sound, bass management, as well as other channel transferring techniques. Generally, matrixing achieves system and signal compatibility by electrically mixing two or more sound channels to produce one or more new ones. Because new soundtracks must play transparently on older systems, matrixing ensures that no audible data is lost in dated cinemas and home systems. Conversely, matrixing enables new audio systems to reproduce older audio signals that were recorded outside of AC-3 standards.
Since everyone does not have the equipment needed to take advantage of AC-3 5.1 channel sound, an embodiment of matrixing known as downmixing ensures compatibility with older playback devices. Downmixing is employed when a consumer's sound system lacks the full complement of speakers available to the AC-3 format. For instance, a six channel signal must be downmixed for delivery to a stereo system having only two speakers. For proper audio reproduction in the two speaker system, a decoder must matrix mix the audio signal so that it conforms with the parameters of the dual speaker device. Similarly, should the AC-3 signal be delivered to a mono television, the audio decoder downmixes the six channel signal to a mono signal compatible with the amplifier system of the television. A decoder of the playback device executes the downmixing algorithm and allows playback of AC-3 irrespective of system limitations.
Conversely, where a two channel signal is delivered to a four or six speaker amplifier arrangement, Dolby Prologic techniques are employed to take advantage of the more capable setup. Namely, Prologic permits the extraction of four to six decoded channel from two codified digital input signals. A Prologic decoder buffers and disseminates the channels to left, right and center speakers, as well as to two additional loudspeakers incorporated for surround sound purposes. A four-channel extraction algorithm is generically illustrated in FIG. 2. Based on two digital input streams, referred to as Left_input and Right_input, four fundamental output channels are extracted. The channels are indicated in the figure as Left, Right, Central and Surround. Of note, the Prologic decoder generates the center channel by summing the left and right-had stereo channels and combining identical portions of each signal.
A time delay is applied to the surround channel to make it more distinguishable. The stored delay is on the order of 20 ms, which is still too short to be perceived as an echo. Ordinary stereo-encoded material can often be played back satisfactorily through a Prologic decoder. This is because portions of the sound that are identical in the left and right-hand channels are heard from the center channel. The surround channel will reproduce the sound to which various phase shifts have been applied during recording. Such shifts include sound reflected from the walls of the recording location or processed in the studio by adding reverberation. The goal of Prologic is to simulate three discrete-channel sources, with surround steering normally simulating a broad sense of space around the viewer. A center channel equalizer is used to drive a loudspeaker that is centrally located with respect to the listener. Equalizing algorithms controls add emphasis and smoothing functions to the center channel audio, which often is a speech signal.
Enhanced surround sound is a desirable post-processing technique available in systems having ambient noise producing loudspeakers. Such speakers are arranged behind and on either side of the listener. When decoding surround material, four channels (left/center/right/surround) are reproduced from the input signal. Enhanced surround functions divide a single surround channel into two separate surround channels. For instance, the single surround channel produced by the Prologic application is processed into left and right surround channels. Thus, conducting the enhanced surround sound function complements the preceding Prologic output. The labeling of the channels as left and right surround is largely arbitrary, as the audio content of the two channels is the same. However, enhanced surround sound processing introduces a slight time delay between the channels. This time differential tricks the human ear into believing that two distinct sounds are coming from different areas.
In this manner, enhanced surround sound acts as an all pass filter in the frequency domain that introduces a time delay. The delay between the two channels creates a spatial effect. The ambient noise producing surround speakers are arranged behind and on either side of the listener to further assist in reproducing rear localization, true 360° pans, convincing flyovers and other effects.
Bass management techniques are used to redirect low frequency signal components to speakers that are especially configured to playback bass tones. The low frequency range of the audible spectrum encompasses about 20 Hz to 120 Hz. Such techniques are necessary where damage to small speakers would otherwise result. In addition to ensuring that the low frequency content of a music program is sent to appropriate speakers, bass management allows the listener to accurately select a level of bass according to their own preferences.
Virtual Enhanced Surround (VES) and Digital Cinema Sound (DCS) are post-processing methods used to further manage the surround sound component of an audio signal. Both techniques store, divide and sum aspects of the signal to create an illusion of three-dimensional immersion. Which method is used depends on the configuration of a consumer's speaker system. VES enhances playback when the ambient noise or surround sound portion of the signal is conveyed only in two front speakers. DCS is needed to digitally coordinate the ambient noise where rear surround speakers are used.
VES uses digital filters to process the signal to create an augmented spatial effect with two speakers. Similar to enhanced surround, the VES post-processing technique creates time delay and attenuation. More specifically, the right and left surround channels are repetitively stored, summed and differentiated from each other to create new right and left surround channels. These new surround channels embody the spatial effect sought by the listener.
Similar to VES, DCS stores and otherwise manipulates the surround portion of the signal by summing and differentiating channels. The resultant surround sound channels create an illusion of spatial distortion. However, the newly created left and right surround channels are now transmitted to the rear-oriented speakers. As with the VES algorithm, the DCS applications are executed later in the processing sequence to avoid overflow and signal distortion.
Finally, privacy and space considerations commonly draw listeners to select headphones. Headphones allow listeners to discretely enjoy multichannel sound sources, such as movies, with realistic surround sound. The audio signal is now post-processed so that the nearest stereo sound is simulated in the conventional headphone device. Ideally, the headphone circuitry is optimally configured to reflect any matrixing, surround, or bass effects applied to the signal. As with the above post-processing algorithms, a six channel pulse modulated signal is ultimately played back according to the preferences of the listener.
As discussed above in detail, most post-processing circuitry must buffer portions of the audio signal to achieve its respective effect. Certain algorithms, further interject assorted delays into the processing of an audio signal. For instance, surround sound effects rely on time differentials imputed into signals to create a desired three-dimensional listening experience. Similarly, VES and DCS applications apply delays and attenuation to audio signals.
Such delays conventionally require an audio system to temporarily store, or buffer, select frames of the signal within multiple processors. Circuitry processes other frames of the signal in parallel. After some predetermined period, the stored frames are processed and recombined with other processed frames into an output signal. The preset, buffering period corresponds to the delay required by the surround sound algorithm. Such memory allocation applications may severely tax available memory resources, compromising system performance.
More specifically, memory required for post-processing operations can easily exceed the on-chip memory capacity of a Digital Signal Processor (DSP). Such memory requirements typically necessitate memory external from the program processors. External memory conventionally embodies additional processors connected via an internal bus. These processors and other external resources represent additional design and hardware costs to both developers and consumers. Another disadvantage associated with external memory is the slow access as compared with internal or on-chip memory. The remote configuration of such processors can further introduce undesirable delays into processing applications. These delays may attributable to complex search algorithms required to search memory maps, as well as to longer circuit paths traveled by system signals. In a CPU critical application, the on-chip memory is the first choice for the hardware and software designers. Therefore, there is a significant need for more efficient management of memory resource within an audio playback environment.