This invention relates to a method and apparatus for providing background acoustic noise during a discontinued/reduced rate transmission mode of a voice transmission system. The invention is especially applicable to digital voice communications and more particularly to wireless voice communications systems, and bit-rate sensitive applications including digital simultaneous voice and data (DSVD) systems, voice over internet-protocol (VOIP) and digital speech interpolation (DSI) systems.
In wireless voice communication systems, it is desirable to reduce the level of transmitted power so as to reduce co-channel interference and to prolong battery life of portable units. In cellular systems, interference reduction enhances spectral efficiency and increases system capacity. One way to reduce the power level of transmitted information is to reduce the overall transmission bit rate. A typical telephone conversation comprises approximately 40 per cent active speech and about 60 per cent silence and non-speech sounds, including acoustic background noise. Consequently, it is known to discontinue transmission during periods when there is no speech.
Other wireless systems require a continuous mode of transmission for system synchronization and channel monitoring. It is inefficient to use the full speech-coding rate mode for the background acoustic noise because it contains less information than the speech. When speech is absent, a lower rate coding mode is used to encode the background noise. In Code Division Multiple Access (CDMA) wireless communication systems, variable bit rate (VBR) coding is used to reduce the average bit rate and to increase system capacity. The very low bit rate used during speech gaps is insufficient to avoid perceptible discontinuities between the background noise accompanying speech and during speech gaps.
A disadvantage of simply discontinuing transmission, as done by early systems, is that the background noise stops along with the speech, and the resulting received signal sounds unnatural to the recipient.
This problem of discontinuities has been addressed by generating synthetic noise, known as xe2x80x9ccomfort noisexe2x80x9d, at the receiver and substituting it for the received signal during the quiet periods. One such silence compression scheme using a combination of voice activity detection, discontinuous transmission, and synthetic noise insertion has been used by Global System for Mobile Communications (GSM) wireless voice communication systems. The GSM scheme employs a transmitter, which includes a voice activity detector (VAD) which discriminates between voice and non-voice signals, and receiver which includes a synthetic noise generator. When the user is speaking, the transmitter uses the full coding rate to encode the signal. During quiet periods, i.e. when no speech is detected, the transmitter is idle except for periodically updating background noise information characterizing the xe2x80x9crealxe2x80x9d background noise. When the receiver detects such quiet periods, it causes the synthetic noise generator to generate synthetic noise, i.e. comfort noise, and insert it into the received signal. During the quiet periods, the transmitter transmits to the receiver updated information about the background noise using what are known as Silence Insertion Descriptor (SID) frames and the receiver uses the parameters to update its synthetic noise generator.
It is known to generate the synthetic noise by passing a spectrally-flat noise signal (white noise) through a synthesis filter in the receiver, the noise parameters transmitted in the SID frames then being coefficients for the synthesis filter. It has been found, however, that the human auditory system is capable of detecting relatively subtle differences, and a typical recipient can perceive, and be distracted by, differences between the real background noise and the synthetic noise. This problem was discussed in European patent application number EP 843,301 by K. Jarvinen et al., who recognized that a user can still perceive differences where the spectral content of the real background noise differs from that of the synthetic noise. In order to reduce the spectral quality differences, Jarvinen et al. disclosed passing the random noise excitation signal through a spectral control filter before applying it to the synthesis filter. While such spectral modification of the excitation signal might yield some improvement over conventional systems, it is not entirely satisfactory. Mobile telephones, in particular, may be used in a wide variety of locations and the typical user can still perceive the concomitant differences between the background noise accompanying speech and the synthetic noise inserted during non-speech intervals.
An object of the present invention is to provide a background noise coding method and apparatus capable of providing synthetic noise (xe2x80x9ccomfortxe2x80x9d noise) which sounds more like the actual background noise.
To this end, in communications systems embodying the present invention, the background noise is classified into one or more of a plurality of noise classes and the receiver selects one or more of a corresponding plurality of different excitation signals for use in generating the synthetic noise.
According to one aspect of the present invention, in a digital communications system comprising a transmitter and a receiver, the transmitter interrupting or reducing transmission of a voice signal during interval absent speech and the receiver inserting synthetic noise into the received voice signals during said intervals, there is provided a method comprising the steps of assigning acoustic background noise in the voice signal to one or more of a plurality of noise classes, selecting a corresponding one of a plurality of excitation vectors each corresponding to at least one of the classes, using at least part of the selected noise vector to synthesize the synthetic noise, and outputting the synthetic noise during a said interval.
According to a second aspect of the present invention, there is provided a digital communications system comprising a transmitter and a receiver, the transmitter having means for interrupting or reducing transmission of a voice signal during interval absent speech and the receiver having means for inserting synthetic noise into the received voice signals during said intervals, there being provided means for assigning acoustic background noise in the voice signal to one or more of a plurality of noise classes, selecting a corresponding one of a plurality of excitation vectors each corresponding to at least one of the classes, using at least part of the selected excitation vector to synthesize the synthetic noise, and outputting the synthetic noise during a said interval.
In embodiments of either aspect, the transmitter may perform the classification of the background noise and transmit to the receiver a corresponding noise index and the receiver may select the corresponding excitation vector(s) in dependence upon the noise index. The receiver may select from a plurality of previously-stored vectors, or use a generator to generate an excitation vector with the appropriate parameters.
The predefined noise classes may be defined by temporal and spectral features based upon a priori knowledge of expected input signals. Such features may include zero crossing rate, root-mean-square energy, critical band energies, and correlation coefficients. Preferably, however, noise classification uses line spectral frequencies (LSFs) of the signal, with a Gaussian fit to each LSF histogram.
Preferably, the noise classification is done on a frame-by-frame basis using relatively short segments of the input voice signal, conveniently about 20 milliseconds.
In preferred embodiments of either aspect of the invention, linear prediction (LP) analysis of the input signal is performed every 20 milliseconds using an autocorrelation method and windows each of length 240 samples overlapping by 80 samples. The LP coefficients then are calculated using the Levinson-Durbin algorithm and bandwidth-expanded using a factor xcex3=0.994. The LP coefficients then are converted into the LSF domain using known techniques.
The classification unit may determine that the background noise comprises noise from a plurality of the noise classes and determine proportions for mixing a plurality of said excitation vectors for use in generating the synthetic noise. The relative proportions may be transmitted as coefficients and the receiver may multiply the coefficients by the respective vectors to form a mixture.
The transmitter may transmit one or more hangover frames at the transition between speech and no speech, such hangover frames including background noise, and the receiver then may comprise means for deriving the noise class index from the noise in that portion of the received signal corresponding to the hangover frames. The extracting means may comprise a noise classifier operative upon residual noise energy and synthesis filter coefficients to derive the noise class indices.