1. Field of the Invention
The field of this invention relates to signal processing which identifys the type of signal received in order to optimize the transmission and reception of said signal. More particularly, the field of this invention relates to audio signal processing through an encoder selected to optimize the quality of the signal on decoding and optimize the use of bandwidth.
2. Related Art
The related art is replete with detectors and encoders which encodes audio signals which are related to speech. Speech signals are processed and parameters developed in the form of feature vectors which may transmitted in digital form and later combined in a decoder to reconstruct the speech.
Digital speech signals operate on data transmission media having limited available bandwidth. Accordingly, data transmission rates are minimized using various techniques which are geared to optimize speech signals to maintain a high perceptual quality. These systems include all transmission modes such as wireless, Voice Over IP, direct wire, cable, ISDN, modems and the like.
However, such systems do not typically address the problem associated with non-speech signals such as music because the systems are optimized for the human vocal tract. Since these systems are optimized for voice, such systems do not process other non-speech signals such as music very well.
The International Telecommunication Union has established a number of standards for speech processing. Among these are G.729 standard which processes speech at 8 Kbits/second The G.729 standard provides good quality transmission of speech while minimizing band width. This standard presents a standard way of performing the integration and expansion of speech signals to optimize speech quality and ensures communication quality.
Recently, the G.729 standard has been expanded so as to include music processing capability (Annex E at 11.8 Kbits/second, G.729E). Furthermore, the standards now include DTX (Annex G) functionality for 11.8 Kbits/second CS-ACELP algorithm in Annex E. The G.729G standard provides for music detection immediately following Voice Activity Detection (VAD). The music detection algorithm corrects the decision from the VAD in the presence of music signals.
Many systems or methods can currently distinguish between voice and music but do not dynamically adjust encoding systems or bit rate to achieve a better trade-off between maintaining high perceptual quality (where high bit-rate is typically required) and reducing bandwidth requirement for communication increase the quality of the signal.
What is required is a system such as the present invention which can switch the encoding standard or any other standard or technique as required to address the high bit rate requirement of high content signals dynamically so that a more acceptable reconstruction of the signal can take place while allowing low bit rate for speech signals. This requires a system which can provide flexibility for selection of encoding techniques and the degree of granularity applied.
The present invention provides a system where the bit rate encoding or the associated transport mechanism can be changed dynamically to provide encoding for different types of signals at bit rates or encoding methods optimized to properly reconstruct the input signal whether speech or non-speech. It should be noted that non-speech signals can include modem signals and facsimile signals.
In the present invention the application is driven through a change of parameters that can make the system a speech or music recognizer over an IP gateway, for example, dependent what signal is to be listened for. While the dynamic signal selection of the present invention is illustrated using voice over IP, it is equally applicable to other transmission systems, such as wireless, DSI, voice over cable systems and other transmission systems and may be operated on a continuous, incremental or packetized/frame basis.
The dynamic signal detector of the present invention, a includes three basic components a recognizing module which categorizes the type of input signal, an evaluation or classification module which evaluates the quality of the signal based on the category and a recommendation module which makes a recommendation based on the quality of the signal to change the standard used to encode the signals received to improve quality.
The dynamic signal detector receives the digitized input signal and uses an algorithm to extract the feature vectors parameters for evaluation. These parameters are tested and a determination made if a switch of encoding standard or a modification of the transport parameters are required to improve the reconstructed signal. External signals may also be available for evaluation dependent on the particular system.
The dynamic signal detector may be present at both ends of the communication channel. Each is located on the encoder side which detects the digitized signal in the first instance and evaluates the feature vectors to determine the character of the signal. The dynamic signal detector determines whether a quality signal can be generated by the then current encoder and selects a decreased or increased bitrate or other encoding format as required.
For example, if the signal is music a higher bitrate standard than voice is applied. If the signal is voice a lower bandwidth standard will do. If the signal is a modem or a facsimile and modem or facsimile format is applied.
This evaluation, recommendation and change can occur on a continuous basis or on a frame by frame or packet by packet basis dependent on the nature of the signal. Statistical techniques for evaluation of frames or packets and their associated recommendations can also be applied over an arbitrary number of samples, or by whatever other means is suitable for the application.
The additional features of the invention will be described in more detail in the specific embodiment described below.