1. Technical Field
The present invention pertains to a technical field of processing an audio signal, and particularly relates to a technology of adding effects to the audio signal to output a resultant signal.
2. Background Art
There have been conventionally proposed various kinds of technologies for generating a voice with desired characteristics. For example, Japanese Unexamined Patent Publication (Kokai) No. 2002-202790 (paragraphs 0049 and 0050) discloses a technology for synthesizing the so-called husky voice. According to this technology, by performing an SMS (Spectral Modeling Synthesis) analysis to the audio signal presenting a specific voice on frame basis, a harmonic component and a non-harmonic component are extracted as data of a frequency domain, for generation of a voice segment (a phoneme or phoneme chain). When the voice is now actually synthesized, after the voice segments corresponding to a desired vocal sound (for example, lyrics) are mutually linked, addition of the harmonic component and the non-harmonic component is implemented and then, a reverse FFT processing is performed to a result of this addition for every frame, thereby generating the audio signal. According to this configuration, a feature of the nonharmonic component added to the harmonic component is appropriately changed for permitting it to generate the audio signal with the desired characteristics such as the husky voice.
Incidentally, as for an actual human voice, a period of the waveform may irregularly change every moment. This tendency is remarkable particularly in individual voices, such as a rough or harsh voice (the so-called croaky voice). According to the conventional technology described above, however, since the voice is synthesized by the processing in the frequency domain for each frame, the period of this synthesized audio signal will be inevitably kept constant in each frame. As a result, a problem is encountered such that the voice generated by using this technology tends to result in a mechanical and unnatural voice due to fewer changes in period than that of the actual human voice. It should be noted that the case of synthesizing the voice by the link of the voice segments is described as an example here, but a like problem may also be encountered in a technology of changing the characteristics of the voice that a user sounds and of outputting a resultant voice. As will be understood, also in this technology, the audio signal supplied from a sound capturing apparatus, such as a microphone, is converted into the data of the frequency domain for every frame, and the audio signal of a time domain is generated after properly changing the frequency characteristics for every frame, so that the period of the voice in one frame will be kept constant. Thus, according to even this technology, similarly to that disclosed in Japanese Unexamined Patent Publication (Kokai) No. 2002-202790, there is a limit for generating a natural voice close to the actual human voice.