The digital uncompressed representation of a high quality audio signal (e.g. of quality comparable to the one offered by a CD record) requires large amount of data. Nowadays, encoders for reducing the amount of data before storing on data storage devices or before digital transmission are commonly used. A number of various audio signal encoders have been developed. They are presented in the scientific literature, e.g. in: K. Brandenburg, “Perceptual Coding of High Quality Digital Audio”, Applications of Digital Signal Processing to Audio and Acoustics, M. Kahrs, K. Brandenburg (editors), Kluwer Academic Publishers, 1998; M. Bosi, R. E. Goldberg, “Introduction to digital audio coding and standards”, Springer, 2003; and A. Spanias, V. Atti, T. Painter, “Audio signal processing and coding”, Wiley 2007.
Encoders in which the frequency domain representation of an audio signal employing sub-band filters sets or block transforms is used, have gained the greatest popularity. Decoders adapted to decode signals encoded with such compression techniques are commonly used in telecommunication systems and electro-acoustic consumer devices, such as a portable music players, and usually have a form of an application specific integrated circuits. The principle of operation of such devices is also the basis for many international and commercial audio compression standards, e.g.: ISO/IEC JTC1/SC29/WG11 MPEG, International Standard ISO/IEC 11172-3, “Information Technology—Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to About 1.5 Mbit/s, part 3: Audio”; ISO/IEC JTC1/SC29/WG11 MPEG, International Standard ISO/IEC 14496-3, “Coding of Audio-Visual Objects: Audio”; Advanced Television Systems Committee, Document A/52:2010, “Digital Audio Compression Standard (AC-3, E-AC-3)”; and 3GPP TS 26.410, “General audio codec audio processing functions; Enhanced aacPlus general audio codec”.
Another, less popular group of audio signal encoders and decoders are sinusoidal encoders and decoders. Sinusoidal encoders and decoders also use the frequency domain representation of a signal. In particular, representation used in the sinusoidal encoders and decoders is a weighted sum of sinusoidal components. More particularly, the instantaneous amplitudes and the instantaneous frequencies of the components, as well as the instantaneous phases related to the instantaneous frequencies, change continuously over time. Signal compression in such representation is achieved by approximating the changes of the instantaneous frequencies and the instantaneous amplitudes of the audio components by means of simple interpolation functions, such as a polynomial of low degree. It is possible to send information regarding the frequency and the amplitude of each component in intervals much longer than the sampling interval of the original signal. When reconstructing the signal, values of the instantaneous frequency and the instantaneous amplitude of each sinusoidal component for each signal sample are interpolated on a basis of the transferred data. The principle of operation of the sinusoidal encoder is described in the scientific literature, e.g. in: R. J. McAulay, T. F. Quatieri, “Speech analysis/synthesis based on a sinusoidal representation”, IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP-34 (4), 1986; H. Purnhagen, “Very Low Bit Rate Parametric Audio Coding”, 2008; and F. Myburg, “Design of a Scalable Parametric Audio Coder”, 2004. Compression method of such kind is also the basis for many international standards, such as ISO/IEC 14496-3/AMD1, “Coding of audiovisual objects—Part 3: Audio (MPEG-4 Audio Version 2) Harmonic and Individual Lines plus Noise”; ISO/IEC JTC1/SC29/WG11 MPEG, International Standard ISO/IEC 14496-3:2001/AMD2, “Sinusoidal Coding”; Compression methods of such kind are also disclosed in many patent documents.
The patent document U.S. Pat. No. 4,885,790 entitled “Processing of acoustic waveforms” concerns speech signals encoding based on a sinusoidal model. The publication describes a method and apparatus splitting a speech signal into multiple time segments. For each time segment amplitudes, frequencies and phases of sinusoidal components associated with each maximum of the speech signal amplitude spectrum are determined using the DFT (Digital Fourier Transform) block. Next the tracking algorithm merges frequencies, amplitudes and phases of the components of the current segment with the frequencies, amplitudes and phases of the components of the previous segment, basing on the smallest frequency difference. The result of the tracking algorithm is a set of sinusoidal trajectories describing the changes of the frequency, amplitude and phase of each sinusoidal component, encoded with a sampling interval many times greater than the sampling interval of the original audio signal. The trajectories are then encoded by means of known techniques, e.g. PCM (Pulse Code Modulation) or ADPCM (Adaptive Differential Pulse Code Modulation), described in L. R. Rabiner, R. W. Schafer, “Digital Processing of Speech Signals”, Prentice Hall, 1978, and M. Bosi, R. E. Goldberg, “Introduction to Digital Audio Coding and Standards”, Springer, 2003.
The patent document WO 03/036619 A1 entitled “Frequency-differential encoding of sinusoidal model parameters” discloses an audio signal compression method, wherein the sinusoidal components of the sound are encoded in such a way that the decoder receives either the direct representation of the frequency, amplitude, and phase of the component in the current time segment, or corresponding differences between the frequency, amplitude and phase of the component in the current time segment and the frequency, amplitude and phase of the most similar component from the previous time segment. The method includes the optimization algorithm which minimizes the total cost of transmitting the signal by selecting one of the two aforementioned ways of encoding.
The patent document U.S. Pat. No. 7,640,156 B2 entitled “Low bit-rate audio encoding” concerns a parametric audio signal encoding using three models of signal components. The document describes a method and device carrying out the decomposition of the original audio signal into components that can be approximated by a sum of pulses, modulated sine waves with slowly varying characteristics, and a noise having a spectrum that can be approximated by autoregressive filter characteristic with parameters determined by means of known linear prediction technique (LPC).
The patent document U.S. Pat. No. 7,664,633 B2 entitled “Audio coding via creation of sinusoidal tracks and phase determination” discloses an enhanced audio signal encoding method using three models of signal components approximated by a sum of pulses, modulated sine waves and a noise. The document describes a sinusoidal trajectory encoding method in which the mutual dependence of the phase and frequency was taken into account for common encoding both these information. In order to increase encoding efficiency, phase values are subjected to the second order linear prediction and only the quantized prediction error is transmitted. Since the uniqueness of the determination of the sinusoidal components' phases in the subsequent frames must be preserved, the sinusoidal trajectory tracking algorithm does not allow for tracking components exhibiting deep frequency changes over time, which results in a high trajectory fragmentation.
The main limitation of all existing known audio signal encoding methods based on sinusoidal or sinusoidal-noise model is low efficiency of the sinusoidal trajectory representation, resulting from not taking into account the long-term stability and predictability of changes in the parameters of sinusoidal components of speech and music sounds. The goal of the present invention is to solve this problem and to reduce by a factor of several times the number of bits needed to represent the signal while maintaining a good quality of the decoded signal.