1. Field of the Invention
This invention relates generally to audio visual signal processing, and more particularly to methods and apparatus for encoding audio signals.
2. Description of the Related Art
The process of recording a motion picture with sound involves the acquisition of both video images and sound, and in the case of each type of content the acquisition involves the sampling of an otherwise continuous flow of information. For example, the video imagery is frequently sampled at a rate of twenty four frames per second. Audio is typically recorded as an analog signal that is next sampled at some bit or sampling rate to convert the analog voltage signals into digital data. The quality of the analog to digital conversion depends on a number of factors, such as the number of possible voltage levels that are represented digitally. While it might be possible to simply record or otherwise store all of the audio samples, it is typically more efficient to perform some sort of audio encoding of the sampled audio signals prior to storage on some form of media, such as a disk or hard drive.
Many current audio encoders use various techniques for compressing the sampled audio signals before sending the compressed data to a playback or storage device. Examples of these compression techniques include prediction, quantization (both vector and scalar) and Huffman coding. Many audio visual recordings involve significant variations in video and audio content over the duration of the recording. One scene might involve a boisterous action sequence with loud audio content and little dialog and the next scene might involve an intimate conversation between characters with little or no music background, and so on.
Current audio encoders encode audio signals without taking into account what may be valuable video information, such as scene changes, the presence of dialog intensive scenes. As a result, current audio encoders typically determine mode (i.e., prediction on/off), bit-rate allocation and quantization parameters without video signal assistance or side-information. Audio encoder users thus have no means of utilizing video information to improve audio encoder where it is applicable.
Yamaha Corporation markets a front surround system (a sound bar) under models YAS-103 and YAS-93. These models use a feature called “clear voice,” which is intended to improve the quality of voice sounds when a user is viewing video content. When clear voice is enabled, the sound bar makes adjustments to analog audio signals just before they are sent to the speakers of the sound bar. This processing differs from audio encoding because it is performed on analog signals that have undergone digital to analog conversion.
The present invention is directed to overcoming or reducing the effects of one or more of the foregoing disadvantages.