The present invention, in some embodiments thereof, relates to speech parameterization and coding and, more particularly, but not exclusively, to techniques for speech compression, high quality reconstruction and transformation in the parametric domain.
Various speech parameterization and coding techniques have been developed over the last decades, as described in the Springer handbook of speech processing, edited by Jacob Benesty, M. Mohan Sondhi, and Yiteng Huang (London UK, Springer, 2008), which is incorporated herein by reference. The sinusoidal model (SM) of speech is described by R. McAulay and T. Quatieri in “Speech analysis synthesis based on a sinusoidal representation,” (IEEE Trans. Acous. Speech, and Sig. Proc., vol. 34, no. 4, pp. 744-754, August 1986), which is incorporated herein by reference, is very popular for speech transformations in the parametric domain, which may include such changes as prosody modification, spectral warping, gender change and alike. The code-excited linear prediction (CELP) coding is very common for speech compression and high quality reconstruction, described by B. Atal, V. Cuperman, and A. Gersho, in Advances in Speech Coding (Kluwer, Norwell, Mass., 1990), which is incorporated herein by reference.
These two methods, SM and CELP, applied together, such as described by G. Jeong in “Embedded bandwidth scalable wideband codec using hybrid matching pursuit harmonic/CELP scheme”, published in J. Intell. Manuf. (2012) 23:1315-1325, or as the known in the art Harmonic Vector Excitation Coding method, described in ISO/IEC standard number 14496, which are incorporated herein by reference, compromise quality of signal reconstruction for lower bandwidth needs during data transmission, as described by L. Leutelt and U. Heute in “Voice Conversion: Adaptation of Relative Local Speech Rate by MPEG-4 HVXC” presented at the EUSIPCO conference of 2002, vol. 3, pp. 113-116, which is incorporated herein by reference.