1. Field of the Invention
The present invention relates to audio signal processing. It has particular utility in relation to the separation of voiced speech and unvoiced speech in low bit-rate speech coders.
2. Related Art
Low bit-rate speech coders are becoming increasingly commercially important as they enable a more efficient utilisation of the portion of the radio spectrum available to mobile phones.
Speech can be classified into three partsxe2x80x94voiced speech, unvoiced speech and silence. Any one of these may be corrupted by the addition of background noise. On a timescale of milliseconds, voiced speech can be viewed as a succession of repeated waveforms. This fact is exploited in a class of speech coding methods known as Prototype Waveform Interpolation (PWI) Methods. Essentially, these methods involve sending information describing repeated pitch period waveforms only once, thereby reducing the amount of bits required to encode the speech signal. Initial PWI speech coding methods only encoded voiced speech, the other portions of the speech signal were coded using other methods (e.g. Code Excited Linear Prediction methods). One example of such a hybrid coding technique is described in xe2x80x9cEncoding Speech Using Prototype Waveformsxe2x80x9d, W. B. Kleijn, IEEE Transaction on Speech and Audio Processing Vol. 1, pp. 386-399, October 1993.
Later PWI methods were generalised so as to enable unvoiced speech and noise to be encoded as well. An example of such a method is described in xe2x80x9cA General Waveform-Interpolation Structure for Speech Codingxe2x80x9d, W. B. Kleijn and J. Haagen, Signal Processing Theories and Applications, M. Hoit, C. Cowan, P. Grant, W. Sandham (Eds.), p1665-1668, 1994.
However, such coders have drawbacks in that the reconstituted speech sounds buzzy. The present inventors have established that the cause of this xe2x80x98buzzinessxe2x80x99 is a poor separation of the voiced components of speech and the unvoiced/noisy components of speech.
According to a first aspect of the present invention there is provided a method of extracting one of a concordant component and a discordant component of a predetermined segment of an audio signal, said method comprising the steps of:
forming an initial evolution surface from a series of combined magnitude and phase spectra representing segments of said signal around said predetermined segment;
modifying said initial evolution surface to obtain a modified evolution surface representing said one of the concordant component or the discordant component of said signal; and
extracting said one of the concordant component or the discordant component of said predetermined segment from said modified evolution surface;
wherein said modifying step involves:
a plurality of component filtering steps and, prior to at least one of those filtering steps, the substitution of phase information derived from said initial evolution surface or an earlier one of the component steps for the phase information derived from the most recent component step.
Here, concordant is intended to refer to signals whose phase changes slowly in comparison to discordant signals whose phase changes more rapidly.
The present inventors have found that the rate of evolution of the phase information is useful in distinguishing between voiced speech (the concordant component of speech) and unvoiced speech/noise (the discordant component of speech).
However, it is likely that the invention will find application in other areas of audio signal processing such as the enhancement of noise-corrupted speech or music signals.
Conventional low-pass and high-pass Finite Impulse Response (FIR) digital filtering techniques do not reduce the magnitude of discordant and concordant signals respectively to zero. Therefore, they are limited in how well they can extract one of the concordant or discordant components of an audio signal.
A conventional FIR filter might be approximated by a series of shorter FIR filters. By decomposing a filtering process into a plurality of filtering stages and, in one or more of the intervals between those filtering stages, substituting phase information from an earlier stage for phase information from the most recent stage, a filtering process results which repeatedly uses the earlier phase information. Filtering a signal tends to smooth its phase and hence a filtered signal contains less information distinguishing its concordant and discordant parts. By reinstating the earlier phase information, the concordant or discordant component can be more thoroughly removed in the subsequent filtering stage(s). The result is a audio signal filtering process which is better able to extract a concordant or discordant component of an audio signal.
As suggested above, a repeated application of a low-pass filter will leave a modified evolution surface representing the concordant component of said predetermined segment. Preferably, each low-pass filtering step involves the application of an identical low-pass filter. This minimises the. complexity of the processing method.
In preferred embodiments, the phase information derived from the initial evolution surface is used in all of said component steps. This maximises the effectiveness of the extraction method.
One way in which the discordant component can be calculated is to calculate the concordant component according to the first aspect of the present invention and subtract this from the original signal. Similarly, one way in which the concordant component can be calculated is to calculate the discordant component according to the first aspect of the present invention and subtract this from the original signal.
According to a second aspect of the present invention, there is provided an audio signal processor operable to extract one of a concordant component and a discordant component of a predetermined segment of an audio signal, said apparatus comprising:
means arranged in operation to form an initial evolution surface from a series of combined magnitude and phase spectra representing segments of said signal around said predetermined segment;
means arranged in operation to modify said initial evolution surface to obtain a modified evolution surface representing said one of the concordant component or the discordant component of said signal; and
means arranged in operation to extract said one of the concordant component or the discordant component of said predetermined segment from said modified evolution surface;
wherein said apparatus further comprises:
means arranged in operation to carry out a plurality of filtering steps and, prior to at least one of those filtering steps, to substitute phase information derived from said initial evolution surface or an earlier one of the component steps for the phase information derived from the most recent component step.
According to a third aspect of the present invention, there is provided a speech coding apparatus including:
a storage medium having recorded therein processor readable code processable to encode input speech data, said code including:
initial evolution surface generation code processable to generate initial evolution surface data comprising combined magnitude and phase data for segments of said input speech data;
separation code processable to derive separate phase data and magnitude data from said input speech data;
evolution surface modification code processable to generate a modified evolution surface representing one of a voiced component or an unvoiced/noise component of said input speech data; and
component extraction code processable to extract said one of the voiced component or the unvoiced/noise component from said input speech data;
wherein said evolution surface modification code comprises:
evolution surface filtering code processable to filter said initial evolution surface data a plurality of times;
evolution surface decomposition code processable to derive magnitude data and phase data subsequent to one or more of said filtering steps; and
earlier phase reinstatement code processable to replace the phase data obtained on processing said evolution surface decomposition code with an earlier version of the phase data.
According to another aspect of the present invention there is provided a method of waveform interpolation speech coding comprising:
forming an initial evolution surface from a series of combined characteristic waveforms or spectra representing respective segments of said speech;
wherein said formation involves aligning each of said characteristic waveforms or spectra with an earlier characteristic waveform or spectrum of said series; and
said earlier waveform or spectrum is separated from the characteristic waveform or spectrum to be aligned with it by a variable number of members of said series, said variable number varying in accordance with the pitch of said signal.
It is found that the decoded version of unvoiced speech which has passed through a known waveform interpolation coder tends to have too high a periodic component. To reduce the undesirable periodic component in the output version of unvoiced speech, alignment is made with a characteristic waveform or spectrum that is far enough back in the series to have a relatively low number of overlapping samples.