1. Field of the Invention
The present invention relates generally to a periodic signal transformation method, a sound transformation method and a signal analysis method, and more particularly to a periodic signal transformation method for transforming sound, a sound transformation method and a signal analysis method for analyzing sound.
2. Description of the Background Art
When, in the analysis/synthesis of speech sounds, the intonation of speech sound is controlled or when the speech sounds are synthesized for editorial purposes to provide a naturally sounding intonation, the fundamental frequency of the speech sound should be converted while maintaining the tone of the original speech sound. When sounds in the nature world are sampled for use as a sound source for an electronic musical instrument, the fundamental frequency should be converted while keeping the tone constant. In such conversion, a fundamental frequency should be set finer than the resolution determined by the fundamental period. Meanwhile, if speech sounds are changed in order to conceal the individual features of an informant in broadcasting or the like for the purpose of protecting his/her privacy, the tone should be changed with the sound pitch unchanged sometimes, or both the tone and sound pitch should be changed otherwise.
There is an increasing demand for reuse of existing speech sound resources such as synthesizing the voices of different actors into a new voice without actually employing a new actor. As the society ages, there will be more people with a difficulty of hearing speech sound or music due to various forms of hearing impairment or perception impairment. There is therefore a strong demand for a method of changing the speed, frequency band, and the pitch of speech sound to be adapted to their deteriorated hearing or perception abilities with no loss of the original information.
A first conventional technique for achieving such an object is for example disclosed by "Speech Analysis Synthesis System Using the Log Magnitude Approximation Filter" by Satoshi Imai, Tadashi Kitamura, Journal of the Institute of Electronic and Communication Engineers, 78/6, Vol. J61-A, No. 6, pp. 527-534. The document discloses a method of producing a spectral envelope, and according to the method a model representing a spectral envelope is assumed, the parameters of the model are optimized by approximation taking into consideration of the peak of spectrum under an appropriate evaluation function.
A second conventional technique is disclosed by "A Formant Extraction not Influenced by Pitch Frequency Variations" by Kazuo Nakata, Journal of Japanese Acoustic Sound Association, Vol. 50, No. 2 (1994), pp. 110-116. The technique combines the idea of periodic signals into a method of estimating parameters for autoregressive model.
As a third conventional technique, a method of processing speech sound referred to as PSOLA by reduction/expansion of waveforms and time-shifted overlapping in the temporal domain is known.
Any of the above first and second conventional techniques cannot provide correct estimation of a spectral envelope unless the number of parameters to describe a model should be appropriately determined, because these techniques are based on the assumption of a specified model. In addition, if the nature of a signal source is different from an assumed model, a component resulting from the periodicity is mixed into the estimated spectral envelope, and an even larger error may result.
Furthermore, the first and second conventional techniques require iterative operations for convergence in the process of optimization, and therefore are not suitable for applications with a strict time limitation such as a real-time processing.
In addition, according to the first and second conventional techniques, the periodicity of a signal cannot be specified with a higher precision than the temporal resolution determined by a sampling frequency, because the sound source and spectral envelope are separated as a pulse train and a filter, respectively in terms of the control of the periodicity.
According to the third technique, if the periodicity of the sound source is changed by about 20% or more, the speech sound is deprived of its natural quality, and the sound cannot be transformed in a flexible manner.