1. Technical Field
This invention relates to a method and system for controlling a weighting filter based on the spectral content of the input speech signal, among other possible factors.
2. Related Art
An analog portion of a communications network may detract from the desired audio characteristics of vocoded speech. In a public switched telephone network, a trunk between exchanges or a local loop from a local office to a fixed subscriber station may use analog representations of the speech signal. For example, a telephone station typically transmits an analog modulated signal with approximately 3.4 KHz bandwidth to the local office over the local loop. The local office may include a channel bank that converts the analog signal to a digital pulse-code-modulated signal (e.g., DS0). An encoder in a base station may subsequently encode the digital signal, which remains subject to the frequency response originally imparted by the analog local loop, the telephone, and the speaker.
The analog portion of the communications network may skew the frequency response of a voice message transmitted through the network. A skewed frequency response may negatively impact the digital speech coding process because the digital speech coding process may be optimized for a different frequency response than the skewed frequency response. As a result, analog portion may degrade the intelligibility, consistency, realism, clarity or another performance aspect of the digital speech coding.
The change in the frequency response may be modeled as one or more modeling filters interposed in a path of the voice signal traversing an ideal analog communications network with an otherwise flat spectral response. A Modified Intermediate Reference System (MIRS) refers to a modeling filter or another model of the spectral response of a voice signal path in a communications network. If a voice signal that has a flat spectral response is inputted into an MIRS filter, the output signal has a sloped spectral response with an amplitude that generally increases with a corresponding increase in frequency.
In the prior art, an encoder may use weighting filters with identical responses for a pitch-preprocessing weighting filter, an adaptive-codebook weighting filter, and a fixed-codebook weighting filter. The adaptive-codebook weighting filter may be used for open-loop pitch estimation. If identical filters are used for pitch pre-processing and open-loop pitch estimation and if the input speech has a skewed spectral response (e.g., MIRS response), the encoded speech signal may be degraded in perceptual quality. For example, if the input speech signal to the pitch-preprocessing weighting filter has an MIRS spectral response, the output speech signal from the pitch-preprocessing weighting filter may not be as periodic as it otherwise might be with a different spectral response of the input speech signal. Accordingly, the output of the pitch-preprocessing weighting filter may not be sufficiently periodic to capture coding efficiencies or perceptual aspects associated with generally periodic speech. Thus, the need exists for a pitch-preprocessing weighting filter that addresses the spectral response of the input speech signal to enhance the periodicity of the weighted speech signal.
If identical weighting filters are used for both open-loop pitch estimation and fixed-codebook search, the bandwidth of the encoded speech and the perceptual quality of the encoded speech may be degraded. For example, the weighting filters may filter out unwanted noise from the input speech signal, which may lead incidentally to a reduced bandwidth of the encoded speech signal. If the input speech signal has a desired noise component or another speech component that requires a wide bandwidth for accurate encoding, the weighting filters may attenuate the speech noise component of the encoded speech to such a degree that the encoded speech sounds artificial or synthetic when reproduced. Thus, a need exists for weighting filters of an encoder that filter out unwanted noise and yet maintain the appropriate bandwidth necessary for a perceptually accurate reproduction of the speech.