This invention relates to a method and apparatus for time-warping a digitized waveform to have an approximately fixed period, especially, though not exclusively, so as to improve the performance of a class of methods used for compressing digitized speech data for storage or for transmission over digital communication channels.
Speech waveforms are comprised of two primary signal types, described respectively as voiced and unvoiced. Voiced signals exhibit a relatively high degree of periodicity (i.e. they have a repetitive pattern), while unvoiced signals are not periodic. The high degree of periodicity of voiced speech implies that at a given time instant, t, the amplitude of the waveform is approximately equal to the amplitude at some earlier instant, (txe2x88x92T), where T, termed the period of the signal, is a continuous function of time. The greater the degree of periodicity, the greater the similarity between the signal amplitudes at t and (txe2x88x92T). Varying degrees of periodicity between the extremes of purely voiced or purely unvoiced data are also possible.
In systems for storing or transmitting speech, it is common to represent a speech signal in digitized form, i.e. as a sequence of numerical values, termed samples, which represent the amplitude of the signal at discrete points on a continuous time-scale, these points being termed sampling instants. It is well known that provided the sampling instants are separated by a sufficiently small interval in time related to the maximum frequency component in the signal, the original signal at any instant on the continuous time-scale can be computed from the signal samples.
In some techniques used for compressing digitized speech data, it is usual to apply to a digitized speech signal a filter, which may be time varying, and whose effects include reducing fluctuations of the signal""s spectral envelope with respect to time, and increasing the spectral flatness of the signal. For voiced speech, increasing the spectral flatness of the signal usually causes it to exhibit strong peaks once per period. These peaks are known as pitch pulses.
It is also usual in some methods of speech data compression to extract from a digitized speech signal, either after or without the filtering mentioned above, segments of data, each segment containing a finite number of signal samples, corresponding to an interval in time that is a fixed multiple of the signal period. In many systems, the set of signal samples contained in each segment is transformed into another set of data values, having properties that are more advantageous for encoding the signal. Such methods are described, for example, in xe2x80x9cWaveform interpolation for speech coding and synthesis,xe2x80x9d by W. B. Kleijn and J. Haagen, in Speech Coding and Synthesis edited by W. B. Kleijn and K. K. Paliwal (Elsevier Science Publishers, 1995). In some methods of this kind, consecutive extracted segments begin with samples corresponding to points in time separated by one period. Such methods are described, for example, in xe2x80x9cWaveform interpolation with pitch-spaced subbands,xe2x80x9d by W. B. Kleijn, H. Yang and E. Deprettere, in Proc. 5th Int. Conf on Spoken Language Processing, 1998.
For the purposes of encoding the set of data values arising from each extracted segment, either after or without applying the transformations mentioned above, it is desirable for the set of data values to have the same length (the same number of data values). However since each extracted segment corresponds to an interval in time that is a fixed multiple of the signal period, and since the period varies with time, this would not be true of segments extracted from the signal directly. However it can be achieved if the signal is first time-warped so that the period is constant. Time-warping involves creating an invertible mapping that allows any instant on the original continuous time-scale, denoted t, to be associated with a point on another continuous time-scale, denoted txe2x80x2. Based on this mapping it is desired to determine a set of signal values, termed warped signal samples, which are the amplitudes of the signal at time instants that correspond to points on the new time-scale separated by some constant interval.
Since, in general, the warped signal samples correspond to time instants that are different from the original sampling instants, computation of the warped signal samples involves interpolating between the samples of the original digitized signal. In principal, an objective of this process is to produce a new resampled signal with a property that some fixed interval on the new time scale, denoted Txe2x80x2, always corresponds to an interval of one period, as measured on the original time scale.
The objective of producing a warped signal with a fixed period is difficult to achieve precisely. However, a warping that produces an approximately constant period can be satisfactory.
The present invention therefore seeks to provide a method and apparatus for generating a set of warped signal samples from a set of unwarped signal samples, preferably, such that the number of warped signal samples spanning an interval equal to the signal period is approximately constant.
Accordingly, in one aspect, the invention provides a method of generating a set of warped signal samples, the method comprising the steps of:
receiving a sequence of unwarped signal samples, wherein the unwarped signal samples represent the amplitudes of a continuous input signal measured at unwarped sampling instants, wherein the unwarped sampling instants are discrete points on a continuous unwarped time-scale, and wherein the sequence includes at least as many unwarped signal samples as exist in an interval of time equal to the expected maximum value of signal period, wherein signal period is a slowly varying function of time such that the amplitude of a signal at a first point in time is approximately equal to the amplitude at a second point displaced from the first point by an interval equal to the signal period at the first point;
storing the received sequence in a buffer;
determining unwarped pitch pulse locations within an interval spanned by a particular analysis frame, wherein pitch pulses are strong peaks occurring once per period in the input signal, unwarped pitch pulse locations are points on the unwarped time-scale at which pitch pulses occur, and an analysis frame is a predetermined segment of samples in the buffer;
determining an invertible mapping that associates all points within an interval on the continuous unwarped time-scale spanned by the analysis frame with corresponding points on a continuous warped time-scale, such that the mapping can be completely described by a finite number of parameters, and such that if pitch pulses occur within the interval, the mapping minimizes a measure of deviation between warped pitch pulse locations and a predetermined set of desired warped pitch pulse locations, wherein warped pitch pulse locations are points on the warped scale with which the mapping associates the unwarped pitch pulse locations;
determining warped sampling instants, wherein warped sampling instants are points on the original time-scale that are within the time interval spanned by the analysis frame, and which correspond to predefined points on the warped scale; and
interpolating between the unwarped signal samples to compute a set of warped signal samples, wherein the warped signal samples are the values of the continuous input signal at the warped sampling instants.
In one preferred embodiment, the mapping between the unwarped and warped time-scales is such that points on the unwarped scale separated by one period are associated with points on the warped scale that are separated approximately by a warped period, wherein the warped period is a predetermined fixed interval. Further, in a preferred embodiment, the desired warped pitch pulse locations are points on the warped time-scale that are separated by exactly one warped period. Preferably, some quantity of previously received unwarped signal samples are also stored in the buffer.
According to a second aspect, the invention provides an apparatus for generating a set of warped signal samples, the apparatus comprising:
an input terminal for receiving a sequence of unwarped signal samples, wherein the unwarped signal samples represent the amplitudes of a continuous input signal measured at unwarped sampling instants, wherein the unwarped sampling instants are discrete points on a continuous unwarped time-scale, and wherein the sequence includes as many unwarped signal samples as exist in an interval of time equal to the expected maximum value of signal period, wherein signal period is a slowly varying function of time such that the amplitude of a signal at a first point in time is approximately equal to the amplitude at a second point displaced from the first point by an interval equal to the signal period at the first point;
a buffer in which the received sequence is stored;
a first analyzer coupled to the buffer to analyze the data within the buffer and to determine unwarped pitch pulse locations within an interval spanned by an analysis frame, wherein pitch pulses are strong peaks occurring once per period in the input signal, unwarped pitch pulse locations are points on the unwarped time-scale at which pitch pulses occur, and an analysis frame is a predetermined segment of samples in the buffer;
a second analyzer to analyze the unwarped pitch pulse locations and to determine an invertible mapping that associates all points within an interval on the continuous unwarped time-scale spanned by the analysis frame with corresponding points on a continuous warped time-scale, such that the mapping can be completely described by a finite number of parameters, and such that if pitch pulses occur within the interval, the mapping minimizes a measure of deviation between warped pitch pulse locations and a predetermined set of desired warped pitch pulse locations, wherein warped pitch pulse locations are points on the warped scale with which the mapping associates the unwarped pitch pulse locations;
a third analyzer to analyze the parameters of the mapping and to determine warped sampling instants, wherein warped sampling instants are points on the original time-scale that are within the time interval spanned by the analysis frame, and which correspond to predetermined points on the warped time-scale; and
an interpolator for interpolating between the unwarped signal samples to compute a set of warped signal samples, wherein the warped signal samples are the values of the continuous input signal at the warped sampling instants.
Preferably, the desired warped pitch pulse locations are separated by a fixed interval on the warped scale. Preferably, also, the predetermined points on the warped scale, corresponding to the warped sampling instants on the original time-scale, are separated by a fixed interval on the warped scale equal to the separation between adjacent pitch pulse locations divided by an integer. The buffer preferably also stores some quantity of previously received unwarped signal samples.
Thus, in one preferred embodiment, the invention provides a method for generating a set of warped signal samples, in which, during intervals of speech that are approximately periodic, consecutive contiguous subsets of warped signal samples, each subset spanning an interval of one period contain a fixed number of samples, and wherein a sample in each subset corresponding to a pitch pulse is synchronized to occur at an approximately fixed location relative to other samples of the subset, wherein the location of a sample corresponding to a pitch pulse is such that samples at boundaries of the subset are of low amplitude in comparison with the amplitude of the pitch pulse.