The present invention relates to audio coding and decoding and relates more particularly to scalable coding of audio data into a plurality of layers of a standard data channel and scalable decoding of audio data from a standard data channel.
Due in part to the widespread commercial success of compact disc (CD) technologies over the last two decades, sixteen bit pulse code modulation (PCM) has become an industry standard for distribution and playback of recorded audio. Over much of this time period, the audio industry touted the compact disc as providing superior sound quality to vinyl records and cassette tapes, and many people believed that little audible benefit would be obtained by increasing the resolution of audio beyond that obtainable from sixteen bit PCM.
Over the last several years, this belief has been challenged for various reasons. The dynamic range of sixteen bit PCM is too limited for noise free reproduction of all musical sounds. Subtle detail is lost when audio is quantized to sixteen bit PCM. Moreover, the belief may fail to consider the practice of reducing quantization resolutions to provide additional headroom at the cost of reducing the signal-to-noise ratio and lowering signal resolution. Due to such concerns, there currently is strong commercial demand for audio processes that provide improved signal resolution relative to sixteen bit PCM.
There currently is also strong commercial demand for multi-channel audio. Multi-channel audio provides multiple channels of audio which can improve spatialization of reproduced sound relative to traditional mono and stereo techniques. Common systems provide for separate left and right channels both in front of and behind a listening field, and may also provide for a center channel and subwoofer channel. Recent modifications have provided numerous audio channels surrounding a listening field for reproducing or synthesizing spatial separation of different types of audio data.
Perceptual coding is one variety of techniques for improving the perceived resolution of an audio signal relative to PCM signals of comparable bit rate. Perceptual coding can reduce the bit rate of an encoded signal while preserving the subjective quality of the audio recovered from the encoded signal by removing information that is deemed to be irrelevant to the preservation of that subjective quality. This can be done by splitting an audio signal into frequency subband signals and quantizing each subband signal at a quantizing resolution that introduces a level of quantization noise that is low enough to be masked by the decoded signal itself. Within the constraints of a given bit rate, an increase in perceived signal resolution relative to a first PCM signal of given resolution can be achieved by perceptually coding a second PCM signal of higher resolution to reduce the bit rate of the encoded signal to essentially that of the first PCM signal. The coded version of the second PCM signal may then be used in place of the first PCM signal and decoded at the time of playback.
One example of perceptual coding is embodied in devices that conform to the public ATSC AC-3 bitstream specification as specified in the Advanced Television Standards Committee (ATSC) A52 document (1994). This particular perceptual coding technique as well as other perceptual coding techniques are embodied in various versions of Dolby Digital(copyright) coders and decoders. These coders and decoders are commercially available from Dolby Laboratories, Inc. of San Francisco, California. Another example of a perceptual coding technique is embodied in devices that conform to the MPEG-1 audio coding standard ISO 11172-3 (1993).
One disadvantage of conventional perceptual coding techniques is that the bit rate of the perceptually coded signal for a given level of subjective quality may exceed the available data capacity of communication channels and storage media. For example, the perceptual coding of a twenty-four bit PCM audio signal may yield a perceptually coded signal that requires more data capacity than is provided by a sixteen bit wide data channel. Attempts to reduce the bit rate of the encoded signal to a lower level may degrade the subjective quality of audio that can be recovered from the encoded signal. Another disadvantage of conventional perceptual coding techniques is that they do not support the decoding of a single perceptually coded signal to recover an audio signal at more than one level of subjective quality.
Scalable coding is one technique that can provide a range of decoding quality. Scalable coding uses the data in one or more lower resolution codings together with augmentation data to supply a higher resolution coding of an audio signal. Lower resolution codings and the augmentation data may be supplied in a plurality of layers. There is also strong need for scalable perceptual coding, and particularly, for scalable perceptual coding that is backward compatible at the decoding stage with commercially available sixteen bit digital signal transport or storage means.
Scalable audio coding is disclosed that supports coding of audio data into a core layer of a data channel in response to a first desired noise spectrum. The first desired noise spectrum preferably is established according to psychoacoustic and data capacity criteria. Augmentation data may be coded into one or more augmentation layers of the data channel in response to additional desired noise spectra. Alternative criteria such as conventional uniform quantization may be utilized for coding augmentation data.
Systems and methods for decoding just a core layer of a data channel are disclosed. Systems and methods for decoding both a core layer and one or more augmentation layers of a data channel are also disclosed, and these provide improved audio quality relative to that obtained by decoding just the core layer.
Some embodiments of the present invention are applied to subband signals. As is understood in the art, subband signals may be generated in numerous ways including the application of digital filters such as the quadrature mirror filter, and by a wide variety of time-domain to frequency-domain transforms and wavelet transforms.
Data channels employed by the present invention preferably have a sixteen bit wide core layer and two four bit wide augmentation layers conforming to standard AES3 which is published by the Audio Engineering Society (AES). This standard is also known as standard ANSI S4.40 by the American National Standard Institute (ANSI). Such a data channel is referred to herein as a standard AES3 data channel.
Scalable audio coding and decoding according to various aspects of the present invention can be implemented by discrete logic components, one or more ASICs, program-controlled processors, and by other commercially available components. The manner in which these components are implemented is not important to the present invention. Preferred embodiments use program-controlled processors, such as those in the DSP563xx line of digital signal processors from Motorola. Programs for such implementations may include instructions conveyed by machine readable media, such as, baseband or modulated communication paths and storage media. Communication paths preferably are in the spectrum from supersonic to ultraviolet frequencies. Essentially any magnetic or optical recording technology may be used as storage media, including magnetic tape, magnetic disk, and optical disc.
According to various aspects of the present invention, audio information coded according to the present invention can be conveyed by such machine readable media to routers, decoders, and other processors, and may be stored by such machine readable media for routing, decoding, or other processing at later times. In preferred embodiments, audio information is coded according to the present invention, and stored on machine readable media, such as compact disc. Such data preferably is formatted in accordance with various frame and/or other disclosed data structures. A decoder can then read the stored information at later times for decoding and playback. Such decoder need not include encoding functionality.
Scalable coding processes according to one aspect of the present invention utilize a data channel having a core layer and one or more augmentation layers. A plurality of subband signals are received. A respective first quantization resolution for each subband signal is determined in response to a first desired noise spectrum, and each subband signal is quantized according to the respective first quantization resolution to generate a first coded signal. A respective second quantization resolution is determined for each subband signal in response to a second desired noise spectrum, and each subband signal is quantized according to the respective second quantization resolution to generate a second coded signal. A residue signal is generated that indicates a residue between the first and second coded signals. The first coded signal is output in the core layer, and the residue signal is output in the augmentation layer.
According to another aspect of the present invention, a process of coding an audio signal uses a standard data channel that has a plurality of layers. A plurality of subband signals are received. A perceptual coding and second coding of the subband signals are generated. A residue signal that indicates a residue of the second coding relative to the perceptual coding is generated. The perceptual coding is output in a first layer of the data channel, and the residue signal is output in a second layer of the data channel.
According to another aspect of the present invention, a processing system for a standard data channel includes a memory unit and a program-controlled processor. The memory unit stores a program of instructions for coding audio information according to the present invention. The program-controlled processor is coupled to the memory unit for receiving the program of instructions, and is further coupled to receive a plurality of subband signals for processing. Responsive to the program of instructions, the program controlled processor processes the subband signals in accordance with the present invention. In one embodiment, this comprises outputting a first coded or perceptually coded signal in one layer of the data channel, and outputting a residue signal in another layer of the data channel, for example, in accordance with the scalable coding process disclosed above.
According to another aspect of the present invention, a method of processing data uses a multi-layer data channel having a first layer that carries a perceptual coding of an audio signal and having a second layer that carries augmentation data for increasing the resolution of the perceptual coding of the audio signal. According to the method, the perceptual coding of the audio signal and the augmentation data are received via the data channel. The perceptual coding is routed to a decoder or other processor for further processing. This may include decoding of the perceptual coding, without further consideration of the augmentation data, to yield a first decoded signal. Alternatively, the augmentation data can be routed to the decoder or other processor, and therein combined with the perceptual coding to generate a second coded signal, which is decoded to yield a second decoded signal having higher resolution than the first decoded signal.
According to another aspect of the present invention, a processing system for processing data on a multi-layer data channel is disclosed. The multi-layer data channel has a first layer that carries a perceptual coding of an audio signal and a second layer that carries augmentation data for increasing the resolution of the perceptual coding of the audio signal. The processing system includes signal routing circuitry, a memory unit, and a program-controlled processor. The signal routing circuitry receives the perceptual coding and augmentation data via the data channel, and routes the perceptual coding and optionally the augmentation data to the program-controlled processor. The memory unit stores a program of instructions for processing audio information according to the present invention. The program-controlled processor is coupled to the signal routing circuitry for receiving the perceptual coding, and is coupled to the memory unit for receiving the program of instructions. Responsive to the program of instructions, the program controlled processor processes the perceptual coding and optionally the augmentation data according to the present invention. In one embodiment, this comprises routing and decoding of one or more layers of information as disclosed above.
According to another aspect of the present invention, a machine readable medium carries a program of instructions executable by a machine to perform a coding process according to the present invention. According to another aspect of the present invention, a machine readable medium carries a program of instructions executable by a machine to perform a method of routing and/or decoding data carried by a multi-layer data channel in accordance with the present invention. Examples of such coding, routing, and decoding are disclosed above and in the detailed description below. According to another aspect of the present invention, a machine readable medium carries coded audio information coded according to the present invention, such as any information processed in accordance with a disclosed process or method.
According to another aspect of the present invention, coding and decoding processes of the present invention may be implemented in a variety of manners. For example, a program of instructions executable by a machine, such as a programmable digital signal processor or computer processor, to perform such a process can be conveyed by a medium readable by the machine, and the machine can read the medium to obtain the program and responsive thereto perform such process. The machine may be dedicated to performing only a portion of such processes, for example, by only conveying corresponding program material via such medium.
The various features of the present invention and its preferred embodiments may be better understood by referring to the following discussion and the accompanying drawings in which like reference numerals refer to like elements in the several figures. The contents of the following discussion and the drawings are set forth as examples only and should not be understood to represent limitations upon the scope of the present invention.