1. Field of the Invention
The present invention relates to a method and apparatus for discriminating and separating non-sounds and voiceless sounds of speech signals from each other so that the length of the non-sound can be modulated without degrading a signal corresponding to the voiceless sound when the speech signals, which have been recorded on a recording medium, are played back at varied speeds.
2. Description of the Related Art
In a conventional apparatus, when speech signals recorded on a recording medium are played back at a varied play-back speed, the tone of the speech sounds different from the original tone due to degradation in the reproduced speech signals resulting from the variation in play-back speed. For example, when the play-back is performed at a high speed, the frequency of speech signal being played back varies from that of the original speech signal. As a result, the speech is typically heard as a "peep-peep" sound. On the other hand, when the recorded speech signals are played back at a low play-back speed, the reproduced speech will typically have a "loosened tape sound".
A conventional method for preventing such phenomenons is described in Japanese Patent Laid-open Publication No. Heisei 4-168499 (Jun. 16, 1992), which discloses a method for partially playing back speech signals that are read into a memory buffer. In accordance with this method, when the play-back speed is doubled, speech signals read by the memory buffer are partially played back in such a manner that only one of two successive time-slices of the speech signals are played back.
For example, when a vocal recording of "I go to school with Jane" is played back at a double speed in accordance with the above-mentioned conventional method, components of the original speech corresponding to the shaded portions shown in FIG. 1 are eliminated, so that only the speech signals "I to with Jane" is reproduced. Since the conventional method plays back only a part of the speech signals at a higher play-back speed so as to maintain the original tone of the speech, the original meaning of the speech is lost. As a result, it is very difficult to understand the original meaning of the recorded speech using the conventional reproduction method and apparatus.
In an attempt to prevent both a loss of speech signals and a degradation in tone from occurring when recorded speech signals are played back at varying speeds, the present inventors have conceived a speed-variable speech signal reproduction apparatus and method as disclosed in Korean Patent Application No. 94-24514, which is entitled "Speed-Variable Audio Play-Back Apparatus".
In order to explain how the length of speech signal is modulated by the above-mentioned speed-variable audio signal play-back apparatus, the basic form of speech signal will first be described with reference to FIG. 2. As illustrated, a waveform of a speech signal consists of various sounds, namely, voiceless sounds, voice sounds and non-sounds, along with noise components. Voice sounds are sounds involving vibrations at the person's vocal organ, and include vowels, nasal sounds and flowing sounds.
On the other hand, voiceless sounds are sounds, such as noise, generated at the point of articulation formed by an articulation organ such as the speaker's tongue, teeth or lips. Generally, voiceless sounds, which are irregularly generated, are indicative of the characteristics of corresponding sounds. On the other hand, voice sounds, which are regularly generated, are indicative of the lengths of corresponding sounds, along with the characteristics of corresponding speech signals.
For example, when a sound "ka" is analyzed, it is determined that that sound consists of two sounds which are simultaneously generated, namely, a voiceless sound corresponding to "k", and a regular voice sound corresponding to "a". Where this sound "ka" is modulated in length, only the number of waveforms corresponding to the voice sound varies, and the voiceless sound is not varied. This will be described in more detail with reference to FIGS. 3A-3C.
As shown in FIG. 3A, the sound "ka" consists of a voiceless sound portion corresponding to "k" and one voice sound waveform corresponding to "a". As shown in FIG. 3B, on the other hand, the sound "ka-" consists of a voiceless sound portion corresponding to "k" and two voice sound waveforms corresponding to "a-". Alternatively, as shown in FIG. 3c, the sound "ka--" consists of a voiceless sound portion corresponding to "k" and three voice sound waveforms corresponding to "a--".
As apparent from FIGS. 3A-3C, each of the speech signals consists of a voiceless sound, whose waveform does not vary even when the length of a corresponding speech signal varies, and a voice sound, which has a plurality of the same waveforms, the number of which varying depending on the sound. Accordingly, the speed-variable audio play-back apparatus as proposed by the inventors in the above-referenced Korean patent application operates to play back a speech signal at a varied speed while preventing any degradation in tone and loss of the speech signal by copying or eliminating a part of a plurality of the same waveforms, which correspond to a voice sound of the speech signal, without modulating a voiceless sound of the speech signal.
To reproduce speech signals at a varied play-back speed more effectively, however, it is desirable not only to vary the length of the voice sound of a speech signal, but also to vary the length of the non-sound of the speech signal. However, like non-sounds, voiceless sounds have a very irregular waveform characteristic. That is, non-sounds which include noise components have waveforms substantially similar to those of voiceless sounds.
Accordingly, it is very important to distinguish such voiceless sounds from non-sounds to achieve accurate reproduction of the sound signals at a varied play-back speed. However, it is difficult to distinguish voiceless sounds from non-sounds using conventional methods. For example, if the noise component of the non-sound is determined to be the same as a voiceless sound component, it is impossible to distinguish and thus modulate the non-sound.
On the other hand, when the noise component included in the non-sound has a voltage level higher than a predetermined level, it may be incorrectly recognized as a voiceless sound. Hence, the noise may be processed along with voiceless sounds. As a result, the noise is reproduced along with original sounds in a normal play-back mode or in a speed-varied play-back mode.