1. Field of the Invention
The present invention relates to a speech signal modification and concatenation method used in forming a spoken message by using sound-recording and editing techniques, for efficiently performing addition or modification of spoken messages, so as to establish and economically maintain a system using spoken messages.
2. Description of the Related Art
Recently, spoken messages are used in services such as announcements in stations; highway radio-announcements for providing information about traffic jams and the like; and voice guidance systems for information searches. Such spoken messages are formed by previously recording spoken sounds produced by a human, and then concatenating these sounds.
In forming spoken messages in this way, in a case in which a new message which differs from any already-formed messages was required, and if the xe2x80x9cnewxe2x80x9d message had not yet been recorded, additional recording of any new spoken sounds was necessary.
In such a case, it was necessary for the same person who previously produced the already-recorded spoken sounds to produce additional spoken sounds in order to avoid an abrupt change of voice characteristics between the already-recorded voice and the newly-recorded voice, and to naturally concatenate the two voices.
However, even if the speaker were the same, the voice characteristics may be different from those at the time of the previous recording due to the passage of time since the previous recording, and the like. Therefore, if any comprehension difficulty due to concatenation of old and new spoken messages were expected, re-recording and re-forming of all relevant spoken messages was required.
In addition, if the previous speaker were absent, it was necessary for another speaker to produce the necessary spoken sounds instead, wherein re-recording of all relevant spoken messages was required.
Furthermore, it is also possible to form those spoken messages by using a speech-synthesis device. However, also in this case, similar problems may appear when two speech signals having different voice characteristics, due to, for example, having used different speech-synthesis devices, are concatenated.
The present invention was made in consideration of the above problems, and it is an object of the present invention to provide a speech signal modification and concatenation method for combining spoken messages having different voice characteristics, without causing a sense of incompatibility, and for making it possible to efficiently perform addition or modification of spoken messages.
Accordingly, the present invention provides: a speech signal modification and concatenation method for concatenating two speech signals having different voice characteristics, the method comprising the step of concatenating the speech signals by modifying a parameter indicating a character of speech signals in a manner such that the parameter is gradually changed from a value indicating a feature of one of the speech signals to a value indicating a feature of the other speech signal over a predetermined period.
Even if voice characteristics of the speakers are significantly different, listeners do not sense substantial incompatibility if the amount of modification per unit of time is relatively small. According to the present invention, it is possible to concatenate voices by repeating a measure of modification, which does not produce a sense of incompatibility, a plurality of times. That is, in a concatenation section of a spoken message, which is concatenated according to the present invention, the voice characteristic thereof gradually changes over a period.
As the above parameter, a spectrum of spoken sounds or a fundamental frequency of spoken sounds may be used, and the rate of changing the parameter can be arbitrarily set. For example, if the spectrum of speech signals is used as the parameter, it is possible to adopt a method comprising the steps of: in a phoneme which corresponds to the two speech signals, determining each pitch correspondence between the two signals; generating a spectrum, for every corresponding pitch, by combining, with respect to a boundary frequency, a portion above the boundary frequency among the spectrum of one speech signal and a portion below the boundary frequency among the spectrum of the other speech signal, and determining the generated spectrum as a spectrum at the relevant pitch; and with respect to the generation of spectra, changing the boundary frequency for each unit time. Here, if the change of the boundary frequency is performed such that the boundary frequency gradually increases from a value at the start of change to a value at the end of change; the rate of change is lower in a stage of relatively low boundary frequencies near the start of change, while the rate of change is higher in a stage of relatively high boundary frequencies near the end of change, a more natural voice (characteristic) change can be realized, and the change further matches the characteristics of the sense of hearing of humans.
That is, according to the present invention, a time-scaled change of a feature-amount of spoken sounds can be performed. As a result, even if two speech signals of different speakers are concatenated, it is possible to avoid an abrupt change of voice characteristics in the concatenation section, and it is thus possible to concatenate speech signals without causing a sense of incompatibility to listeners.