It is known from past research into speech signal recognition that even if a subject hears a speech signal not “as is” and components of the speech signal are subjected to noise degradation using a predetermined method, words can still be recognized to a considerable extent. For example, such technology has been described in non-patent document 1, non-patent document 2, and non-patent document 3.
According to the documents, a signal is produced by summing up signals produced by dividing a speech signal into 4 frequency bands (0-600, 600-1500, 1500-2500, and 2500-4000 Hz), obtaining amplitude envelopes for each frequency band by subjecting the respective speech signals to half-wave rectification and low-pass filtering at 16 Hz, and overlaying the envelopes on band noise corresponding to each frequency band. Such a signal is called Noise-Vocoded Speech Sound. An intelligibility of about 80% has been reported when presenting normal-hearing subjects with Noise-Vocoded Speech Sound.
[Non-patent document 1] Shannon, R. V., et al.: “Speech Recognition with Primarily Temporal Cues”, Science, Vol. 270, pp. 303-305 (1995)
[Non-patent document 2] Yoshihisa Obata, Hiroshi Riquimaroux: Speech perception based on temporal amplitude change with spectrally degraded synthetic sound, Materials of the Auditory Research Forum of The Acoustical Society of Japan, H-99-6 (1999).
[Non-patent document 3] Yoshihisa Obata, Hiroshi Riquimaroux: Intelligibility of synthesized Japanese speech sound made of band noise—preliminary study for a speech recognition processor utilizing central auditory function—, Materials of the Auditory Research Forum of The Acoustical Society of Japan, H-2000-3 (2000).