1. Field
The present invention relates to a speech signal processing system, a speech signal processing method and a speech signal processing method program that include a speech signal conversion process, and relates to a speech signal processing system, a speech signal processing method and a speech signal processing method program that use characteristics such as a noise environment and a volume of an input speech.
2. Description of the Related Art
An example of a speech conversion system that performs speech signal conversion is described in Japanese Unexamined Patent Publication No. 2000-39900 (hereinafter “Patent Literature 1”). The speech conversion system described in Patent Literature 1 has a speech input unit 1, an input amplifier circuit, a variable amplifier circuit, and a speech synthesis unit as components, and operates to mix an environmental sound that has been inputted from the speech input unit 1 and has passed through the input amplifier circuit, and a speech outputted from the speech synthesis unit, in the variable amplifier circuit, and to output a synthesized speech that has been converted.
Moreover, Japanese Unexamined Patent Publication No. . 2007-156364 (hereinafter “Patent Literature 2”) describes a speech recognition apparatus that synthesizes a normalized noise model obtained by normalizing a noise model synthesized from an acoustic characteristic amount of a digital signal in a noise section, with a clean speech model, to generate a normalized noise-superimposed speech model, and uses a normalized noise model obtained by normalizing it, as an acoustic model, to obtain a speech recognition result.
However, in a method of synthesizing a speech by always superimposing the environmental sound at a current time point as described in Patent Literature 1, there is a problem that the environmental sound at a time point when a speech for speech recognition has been inputted (in other words, a time point when a user has intentionally inputted the speech, that is, any time point for the user) cannot be superimposed. Moreover, similarly, there is a problem that characteristics of the speech inputted for the speech recognition cannot be added. For example, the characteristics of the input speech, such as a volume, and distortion of a signal due to a high or low volume (including blocking of a speech signal, mainly due to a failure in a communication path) cannot be added.
Moreover, in a technique described in Patent Literature 2, when speech conversion is performed, such an attempt to use characteristics such as a noise environment and a volume of a particular speech is not considered at all. Moreover, the speech recognition apparatus described in Patent Literature 2 is not configured to be applicable for such use. This is because the technique described in Patent Literature 2 is a technique for normalizing the noise model in order to improve speech recognition result accuracy for a speech mixed with a noise.
Consequently, an object of the present invention is to provide a speech signal processing system, a speech signal processing method and a speech signal processing program that preferably use the characteristics such as the environmental sound such as a noise, the volume of the input speech, and the blocking of the speech signal, at the time point when the speech for the speech recognition has been inputted.