1. Field of the Invention
The present invention relates to a vibration wave detecting method/detector for detecting the characteristics of the vibration waves, such as sound waves, to he propagated in a medium.
2. Description of the Prior Art
In the conventional system for executing speech recognition, vibrations of a microphone which received speech signals are converted-amplified into electric signals by an amplifier, and then, the analog signals are converted into digital signals by an A/D convertor to obtain speech digital an signals. Fast Fourier transform is applied to the speech digital signals by a software on a computer, so as to extract the features of the speech. Such a speech recognition system as described above is disclosed in IEEE Signal Processing Magazine, Vol. 13, No. 5, pp. 45-57 (1996).
In order to extract the features of the speech signals with better efficiency, it is necessary to calculate acoustic spectra within a time period when the speech signals are considered stationary. The speech signal is normally considered stationary within the time period of 10 through 20 msec. Therefore, signal processing such as Fast Fourier transform or the like is conducted, by the software on the computer, on the speech digital signals included within the time period with 10 through 20 msec as a period.
In the conventional speech recognizing method as described above, the speech signals including the entire instantaneous zones are converted into electric signals by a microphone. To analyze the spectra of the electric signals, the A/D conversion makes the frequencies digital. The speech digital signal data are compared with the predetermined speech wave data to extract the features of the speech.
Auditory mechanism and sound psychological physical properties are described in detail by Ohm Company Co., 1992 in "Neuro Science & Technology Series Speech Auditory and Neuro Circuit Network Model" (pp.116-125) written by Seiichi Nakagawa, Kiyohiro Shikano, Youichi Toukura under the supervision of Shunichi Amari. This literature shows that the measure of the sound pitch audible by human beings corresponds linearly to the measure of a mel scale, instead of corresponding to linearly to frequency as physical value. The mel scale, a psychological attribute (psychological measure) representing the pitch of the sound indicated by a scale, is a scale where the intervals of the frequencies called pitches can be heard equal in interval by human beings are directly numerated. The pitch of the sound of 1000 Hz, 40 phon is defined 1000 mel. An acoustic signal of 500 mel can be heard as a sound of 0.5 time pitch. An acoustic signal of 2000 mel can be heard as the sound of twice pitches. The mel scale can be approximated as in the following (1) equation by using the frequency f [Hz] as the physical value. Also, the relationship between the sound pitch [mel] and the frequency [Hz] in the approximate equation is shown in FIG. 1. EQU mel=(1000/log2) log (f/1000+1) (1)
In order to extract the features of the speech with better efficiency, it is often conducted to convert the frequency bands of the acoustic spectra into such mel scales. The conversion, into the mel scale, of the acoustic spectra is normally carried out by the software on the computer as in the analysis of the spectra.
Also, as a method of extracting the features of the speech with better efficiency, it is often conducted to convert the frequency bands of the acoustic spectra into a Bark scale. The Bark scale is a measure corresponding to the loudness of the psychological sound of the human being. In sounds of a certain degree or larger, the Bark scale shows the frequency band width (is called critical band width) audible by human beings, and sounds within the critical band width, even if they are different, can be heard the same. When, for example, large noises occur within the critical band width, the scale showing the frequency band wherein the signal sounds and its noises, despite different frequencies, cannot be judged with human auditory system, is the Bark scale.
In a field of the speech-signal processing, the critical band width to handle easily on the computer is demanded, and consequently the frequency axis of the acoustic spectra is shown in a Bark scale where one critical band is defined as one Bark. FIG. 2 shows the numerical value relationship between the critical band width and the Bark scale. The critical band width and the Bark scale can be approximated as in the following (2) and (3) equations, using the frequency f [kHz] as a physical value. EQU Critical Band Width: CB [Hz]=25+75 (1+1.4f.sup.2).sup.0.69 (2) EQU Bark Scale: B [Bark]=13 tan.sup.-1 (0.76f) 3.5 tan.sup.-1 (f/7.5)(3)
It is known to use an engineering functional model of acoustic peripheral system in the speech recognition field, and the conception of the model is described in detail in the Literature "Neuro Science & Technology Series Speech Auditory and Neuro Circuit Network Model" (pp.162-171). In the engineering functional model, frequency spectra analysis is preprocessed by band width filter groups. In, for example, the preprocessing at a Seneff model which is one of the representative engineering functional model, the frequency spectra analysis is conduced by critical band width filter groups having forty independent channels in the frequency range of 130 through 6400 Hz. At that time, the frequency band of the acoustic spectra is converted into the Bark scale.
The conversion into the Bark scale can he normally conducted by the software on the computer as in the other analysis of the spectra.
In the conventional method of conducting Fast Fourier transform on the digital acoustic signal, by the software on the computer, to analyze the spectra of the acoustic signal, the calculation amount becomes immense so that the calculating load becomes bigger. Even in conducting Fast Fourier transform on the acoustic signal spectra and conducting with the software on the computer to convert into the mel scale, the calculation amount becomes immense so that the calculating load becomes bigger. Even when the spectra of the acoustic signal is analyzed in the frequency spectra by critical band width filter groups, and converting into the Bark scale is conducted with the software on the computer, the calculation amount becomes immense and the calculating load becomes large.
In the conventional methods, there are not problems in the speech where the acoustic spectra does not change as time passes, like only vowel sounds. But a language is made up of consonant sounds and vowel sounds. When a consonant sound comes for a first time, and a vowel sound comes for a second time like Japanese, in general, the stress of the vowel sound becomes larger as time passes. And English is made up of complicated consonant sounds and vowel sounds. In these cases, conventionally, it was difficult to judge when the sounds were changed from consonant sounds to the vowel sounds, because the speech was recorded instantaneously, the acoustic spectra of the entire band were integrated through division for each constant time for analyzing of the speech.
Therefore, the judging ratio of the speech recognition was reduced. In order to solve the problems, much more speech patterns are stored in advance in the computer and are applied into either of these speech patterns, thereby increasing calculation load more.