1. Technical Field
The present invention relates generally to voice activity detection in speech coding; and, more particularly, it relates to voice activity detection that accommodates substantially music-like signals in speech coding.
2. Related Art
Conventional speech signal coding systems have difficulty in coding speech signals having a substantially music-like signal contained therein Conventional speech signal coding schemes often must operate on data transmission media having limited available bandwidth. These conventional systems commonly seek to minimize data transmission rates using various techniques that are geared primarily to maintain a high perceptual quality of speech signals. Traditionally, speech coding schemes were not directed to ensuring a high perceptual quality for speech signals having a large portion of embedded music-like signals.
The reasons for this were many in various communication systems employed on various media. One common reason, within speech coding. systems designed for wireless communication systems, was the fact that air time was prohibitively expensive. A user of a wireless communication system .was not realistically expected to wait xe2x80x9con holdxe2x80x9d using his wireless device. Design constraints, such as economic constraints dictated by expensive air time, were among those constraints that directed those working in the art of speech coding and speech. processing not to devote significant energies to trying to maintain a high perceptual quality for speech signals having a substantially music-like signal contained therein. Conventional speech coding methods do not typically address the problem associated with trying to ensure a high perceptual quality for speech signals having a substantially music-like signal.
Another common reason that is presently applicable, within speech coding systems designed for wireline communication systems, is the fact that the bandwidth available for such communication systems was prohibited limited. Moreover, as such communication systems continue to grow in size and complexity, the communication system became more and more congested. Various techniques have been developed in the art of speech coding and speech processing to accommodate communication systems having limited bandwidth. The discontinued transmission method is one such example, known those having skill in the art of speech coding and speech processing, to maximize data transmission over already limited communication media.
Also, within the ITU-Recommendation G.729, an annex G.729E high rate extension has recently been adopted by the industry to assist the G.729 main body, and although the annex G.729E high rate extension provides increased perceptual quality for speech-like signals than does the G.729 main body, it especially improves the quality of coded speech signals having a substantially music-like signal embedded therein. However, traditional methods of performing voice activity detection (VAD), that are embedded within the annex G.729B discontinued transmission (DTX: (VAD, SID, CNG)), that also performs silence description coding (SID) and comfort noise generation (CNG), often improperly classify substantially music-like signals as background noise signals. In short, the annex G.729B discontinued transmission (DTX: (VAD, SID, CNG)) is simply inadequate to guarantee a high perceptual quality for substantially music-like signals. This is largely because the available data transmission rate (bit rate) is substantially lower than the annex G.729E high rate extension. The present implementation of the annex G.729E high rate extension accompanied by the annex G.729B discontinued transmission (DTX: (VAD, SID, CNG)) and its desirable voice activity detection simply fails to provide a high perceptual quality for substantially music-like signals.
Further limitations and disadvantages of conventional and traditional systems will become apparent to one of skill in the art through comparison of such systems with the present invention as set forth in the remainder of the present application with reference to the drawings.
Various aspects of the present invention can be found in an extended speech coding system that accommodates substantially music-like signals within a speech signal while maintaining a high perceptual quality in a reproduced speech signal. The extended speech coding system contains internal circuitry that performs detection and classification of the speech signal, depending on numerous characteristics of the speech signal, to ensure the high perceptual quality in the reproduced speech signal. The invention selects an appropriate speech coding to accommodate a variety of speech signals in which the high perceptual quality is maintained.
In certain embodiments of the invention, for speech signal""s having a substantially music-like signal, the extended speech coding system overrides any voice activity detection (VAD) decision, performed by a voice activity detection (VAD) correction/supervision circuitry, that is used to determine which among a plurality of source coding modes are to be employed. In one specific embodiment, the voice activity detection (VAD) correction/supervision circuitry cooperates with a conventional voice activity detection (VAD) circuitry to decide whether to use a discontinued transmission (DTX) speech signal coding mode, or a regular speech signal coding mode having a high rate extension speech signal coding mode.
In certain embodiments of the invention, a speech signal coding circuitry ensures an improved perceptual quality of a coded speech signal even during discontinued transmission (DTX). This assurance of a high perceptual quality is very desirable when there is a presence of a music-like signal in an un-coded speech signal.