The present invention relates to speech coding and in particular to forming of speech coding frames.
A delay is generally a period between one event and another event connected with it. In mobile communication systems, a delay occurs between the transmission of a signal and its reception, the delay resulting from the interaction of a number of different factors, for example, from speech coding, channel coding and the propagation delay of the signal. Long response times produce an unnatural feeling in conversation and, therefore, a delay caused by the system always makes communication more difficult. Thus, the aim is to minimise the delay in each part of the system.
One source of a delay is windowing used in signal processing. The purpose of windowing is to shape the signal into a form required in further processing. For example, noise reducers typically used in mobile communication systems mainly operate in the frequency domain and, therefore, a signal to be noise-reduced is usually transformed frame by frame from the time domain to the frequency domain using a Fast Fourier Transform (FFT). In order that the FFT functions in the desired way, samples divided into frames should be windowed prior to the FFT.
FIG. 1 illustrates the procedure by showing as an example the windowing of a frame F(n) into a trapezoidal form. In windowing, the set of samples contained in the frame F(n) is multiplied by a window function so that a window W(n) 19 resulting from this comprises a first slope 10 (hereinafter referred to as the front slope), containing more recent samples of the frame, a second slope 11 (hereinafter referred to as the rear slope), containing older samples of the frame, and a remaining window part 12 in between them. In the windowing of the example, the samples of the window part 12 that locates between the first and second slopes are multiplied by 1, i.e. their value remains unchanged. The samples of the front slope 10 are multiplied by a descending function where the coefficient of the oldest samples of the front slope 10 approaches one and the coefficient of the newest samples approaches zero. Correspondingly, the samples of the rear slope 11 are multiplied by an ascending function where the coefficient of the oldest samples of the rear slope 11 approaches zero and the coefficient of the newest samples approaches one.
For the noise reduction of speech encoders, the noise reduction frame F(n) (reference 18) is typically formed of an input frame 16, formed of new samples, and of a set of the oldest samples 15 of the preceding input frame. Thus, samples 17 are used in forming two successive input frames. FIG. 1 also illustrates the overlap-add method often used in connection with windowing relating to FFTs. In the method, part of the noise-reduced samples of successive windowed noise reduction frames are summed with each other to improve adjustments between consecutive frames. In the example shown in FIG. 1, the noise-reduced samples of slopes 10 and 13 of successive frames F(n) and F(n+1) are summed so that the data of the front slope 10, calculated from the newer samples of the frame F(n), is summed sample by sample with the slope 13, calculated from the older samples of the frame F(n+1), so that the sum of the coefficients of overlapping slopes is 1. Due to the overlap-add method, the section represented by the front slope 10 cannot, however, be transmitted further from noise reduction before noise reduction is performed for the entire following frame F(n+1) and neither can noise reduction of the next frame F(n+1) be started before the entire next frame is received. Thus, the use of the overlap-add method in the processing of a signal causes an additional delay D1, which is equal to the length of slope 10.
The simplified block diagram in FIG. 2 illustrates the phases of processing for a signal being formed of samples divided into frames, according to prior art. Block 21 represents the windowing of a frame, as presented above and block 22 represents the performance of noise reduction algorithms for windowed frames, comprising at least an FFT being performed on the windowed data and its reverse transformation. Block 23 represents the operations performed according to an overlap-add windowing wherein noise-reduced data is stored for the first slopes 10, 14 of the window, to wait for the processing of the next frame and wherein the stored data is summed with the data of the second slopes 13 of the next frame. Block 24 represents speech-coding related signal pre-processing, which typically comprises high-pass filtering and signal scaling for speech coding. From block 24, the data is transferred to a block 25 for speech coding.
Speech codecs (e.g. CELP, ACELP), used in current mobile phone systems, are based on linear prediction (CELP=Code Excited Linear Prediction). In linear prediction, a signal is encoded frame by frame. The data contained in the frames is windowed and on the basis of the windowed data, a set of auto-correlation coefficients is calculated, which are to be used to determine the coefficients of a linear prediction function to be used as coding parameters.
Lookahead is a known procedure used in data transmission, wherein typically newer data that does not belong to the frame to be processed are utilised, e.g. in a procedure applied to a speech frame. In some speech coding algorithms, such as algorithms according to the IS-641 standard specified by the Electronic Alliance/ Telecommunications Industry Association (EIA/TIA), linear prediction (LP) parameters for speech coding are calculated from a window that contains, in addition to the frame to be analysed, samples that belong to the preceding and following frame. The samples that belong to the following frame are called lookahead samples. A corresponding arrangement has also been proposed for use, e.g. in connection with Adaptive Multi Rate (AMR) codecs.
FIG. 3 illustrates lookahead as used in linear prediction according to the IS-641 standard. Each 20-ms long speech frame 30 is windowed into an asymmetric window 31 that also contains samples belonging to the preceding and following frame. The part of window 31 formed of newer samples is called the lookahead part 32. An LP analysis is made once for each window. As can be seen in FIG. 3, windowing relating to lookahead causes an algorithmic delay D2 in the signal corresponding to the length of the lookahead part 32. Since the arrival of the signal for speech coding is already delayed by a period D1 as a result of noise reduction windowing, the delay D2 is summed with the previously described noise reduction additional delay D1.
According to the invention a method for generating a speech coding frames, the method comprising the steps of:
forming a series of partly overlapping first frames containing speech samples;
processing a first frame of the series of first frames by a first window function for producing a second, windowed, frame having a first slope;
performing noise reduction on the second frame for producing a third frame comprising noise reduced speech samples; and
forming a speech coding frame comprising noise-reduced samples of two successive third frames, at least partly summed with one another
characterised in that the method further comprises the steps of:
forming the speech coding frame so that it has a lookahead part that is formed at least partly of noise reduced speech samples of the first slope, these noise reduced speech samples of the first slope being not summed with any other noise reduced speech samples of the speech coding frame to be formed.
Advantageously, the above-described joint effect of algorithmic delays can be reduced by the invented method and an apparatus implementing the method.
Advantageously, by utilising windowing already performed in noise reduction in speech coding windowing, the algorithmic delays caused by processing phases are not summed with each other.
A speech encoder according to the invention is described in claim 10 and a mobile station according to the invention is described in claim 13. The embodiments of the invention are described in the dependent claims.