The present invention relates to an apparatus for converting the speed of reproducing an acoustic signal. More particularly, the invention relates to an apparatus and method for processing an acoustic signal in real time, thereby to reproduce the signal at a lower speed than the signal has been generated.
Speech speed converters that convert speech speed in real time are used for various purposes. More specifically, a speech speed converter is used to help people learn foreign languages, to assist elderly persons with weakening hearing and aurally handicapped persons, or to enable people of different mother tongues to communicate with one another. The real-time speech speed converter reproduces any voiced part of an input acoustic signal at a lower speed than the voiced part has been produced (by means of time expansion) and any voiceless part of the input acoustic signal at a higher speed than the voiceless part (by means of time compression). Thus, the converter changes the acoustic signal to one that represents a more distinct and perceivable speech sound. One of the essential functions of the speech speed converter is to compensate the delay of the output signal, which has resulted from the time expansion of the voiced part, in the process of time-compressing the voiceless part of the acoustic signal. This makes it possible to minimize the time difference between the original speech sound and the reproduced speech sound.
A conventional real-time speech speed converter will be described, with reference to FIG. 1.
As shown in FIG. 1, the real-time speech speed converter comprises an input terminal In, an input section 1, a data storage section 2, a characteristic detecting section 3, and a calculation section 4. The input section 1 receives an acoustic signal s1 supplied to the input terminal In. The data storage section 2 stores the acoustic frame signal s1 in the form of an acoustic frame signal s2 that has a particular length. The characteristic detecting section 3 receives the acoustic frame signal s2 read from the data storage section 2 and detects the characteristic s3 of the acoustic frame signal s2. The characteristic s3 detected is supplied to the calculation section 4. The calculation section 4 receives a write-position signal s7 and a read-position signal s8, too. (The signals s7 and s8 will be described later.) The calculation section 4 calculates a speech-speed converting rate s4 from the characteristic s3.
As FIG. 1 shows, the real-time speech speed converter further comprises a speech-speed converting section 5, an output-data writing section 6, an output-data storage section 7, an output-data reading section 8, and an output section 9. The speech-speed converting section 5 receives an acoustic frame signal s5 read from the data storage section 2. The speech-speed converting section 5 processes the acoustic frame signal s5 in accordance with the speech-speed converting rate s4, thereby generating an acoustic frame signal s6 that has a specific length. The acoustic frame signal s6, thus generated by the section 5. The output-data storage section 7 stores the output signal of the speech-speed converting section 5 as an acoustic frame signal s6 converted in terms of speech speed, as is illustrated in FIG. 2. The output-data writing section 6 generates a write-position signal s7 that designates the position where the signal s6 should be written in the output-data storage section 7. In the output-data storage section 7, the acoustic frame signal s6 is written at the position designated by the write-position signal s7. The output-data reading section 8 generates a read-position signal s8 that designates the position from where an output acoustic frame signal s9 should be read from the output-data storage section 7. The acoustic frame signal s9 is read from the output-data storage section 7, at the position designated by the read-position signal s8. The acoustic frame signal s9, thus read, is output through the output section 9.
The output-data storage section 7 has a large storage capacity. The section 7 stores the delayed part of the acoustic frame signal s9 (i.e., the time-expanded, voiced part). The output-data storage section 7 is, for example, a semiconductor memory. In order to lower speech speed as much as desired, the real-time speech speed converter shown in FIG. 1 needs to have an output-data storage section, e.g., a semiconductor memory, which has a sufficient storage capacity. Without such an output-data storage section 7, the speech speed converter cannot allow for some delay of the output acoustic signal.
The input acoustic signal s1 may be a multi-channel signal. The sampling frequency may be comparatively high. In either case, the output-data storage section 7 must be an expensive one that can serve to lower the speech speed as much as desired. This would increase the manufacturing cost of the real-time speech speed converter.
For example, the input acoustic signal s1 may be a stereophonic 16-bit linear PCM signal that has sampling frequency of 44.1 kHz. In this case, the output-data storage section 7 needs to be a semiconductor memory of the storage capacity given by the following equation (1), in order to delay the output signal by 10 seconds.
16xc3x9744100xc3x972xc3x9710=1411200[bit]≈1.7M[byte]xe2x80x83xe2x80x83(1)
The present invention has been made in consideration of the foregoing. An object of the invention is to provide an apparatus for converting the speed of reproducing the input acoustic signal, which can efficiently delay the output signal without using an output-data storage section of a large storage capacity even if the input acoustic signal has a high sampling frequency.
To achieve the object, a reproducing-speed converting apparatus according to the invention is designed to process the reproducing speed of an input acoustic signal in real time, thereby converting the reproducing speed to a speed lower than the reproducing speed of the original sound. The reproducing-speed converting apparatus comprises: characteristic detecting means for detecting the characteristic of an acoustic frame signal contained in the input acoustic signal and having a predetermined length; calculation mans for calculating a speech-speed converting rate from the characteristic of the input acoustic signal, which has been detected by the characteristic detecting means; speech-speed converting means for performing speech speed conversion on the acoustic frame signal in accordance with the speech-speed converting rate calculated by the calculation means, thereby to generate an acoustic frame signal converted in speech speed; signal encoding means for encoding the acoustic frame signal generated by the speech-speed converting means and having the predetermined length, thereby to reduce the amount of data; coded data storage means for storing the coded data generated by the signal encoding means; and signal decoding means for decoding the coded data read from the coded data storage means, thereby to generate an output acoustic frame signal having a predetermined length.
In the reproducing-speed converting apparatus, the signal encoding means performs an appropriate encoding method, thus encoding the acoustic frame signal generated by the speech-speed converting means and thereby to reduce the amount of data. Hence, the coded data storage means for storing the coded data need not have a large storage capacity. In other words, the apparatus can function as a real-time speech speed converter that can lower speech speed as much as desired even if the coded data storage means has but a small storage capacity.
A reproducing speed converting method according to the invention is designed to process the reproducing speed of an input acoustic signal in real time, thereby converting the reproducing speed to a speed lower than the reproducing speed of the original sound. The method comprising the steps of: detecting the characteristic of an acoustic frame signal contained in the input acoustic signal and having a predetermined length; calculating a speech-speed converting rate from the characteristic of the input acoustic signal, which has been detected in the step of detecting the characteristic; performing speech speed conversion on the acoustic frame signal in accordance with the speech-speed converting rate calculated in the step of calculating the speech-speed converting rate, thereby to generate an acoustic frame signal converted in speech speed; encoding the acoustic frame signal generated by means of the speech-speed conversion, thereby to reduce the amount of data; storing the coded data generated in the step of encoding the acoustic frame signal, into a coded data storage section; and decoding the coded data read from the coded data storage section, thereby to generate an output acoustic frame signal having a predetermined length.
In the reproducing-speed converting method, the acoustic frame signal generated in the step of converting the speech speed is encoded in an appropriate method, hereby to reduce the amount of data. Hence, no coded data storage means of a large storage capacity needs to be used. In other words, the method can lower speech speed as much as desired even if the coded data storage means used has but a small storage capacity.
A reproducing-speed converting apparatus according to this invention is designed to process the reproducing speed of an input acoustic signal in real time, thereby converting the reproducing speed to a speed lower than the reproducing speed of the original sound. This apparatus comprises: characteristic detecting means for detecting the characteristic of an acoustic frame signal contained in the input acoustic signal and having a predetermined length; calculation means for calculating a speech-speed converting rate from the characteristic of the input acoustic signal, which has been detected by the characteristic detecting means; signal encoding means for encoding the acoustic frame signal having the predetermined length, thereby to reduce the amount of data; coded data storage means for storing the coded data generated by the signal encoding means; and signal decoding means for decoding the coded data read from the coded data storage means and for converting speech speed in accordance with the speech-speed converting rate calculated by the calculation mans, thereby to generate an output acoustic frame signal having a predetermined length.
In this reproducing-speed converting apparatus, the signal encoding means interpolates encoding parameters. The speech speed can therefore be converted in accordance with the speech-speed converting rate calculated by the calculation means, in the process of decoding the acoustic signal read from the coded data storage means. This apparatus can therefore function as a real-time speech speed converter that can lower speech speed as much as desired even if the coded data storage means has but a small storage capacity.
A producing-speed converting method according to this invention is designed to process the reproducing speed of an input acoustic signal in real time, thereby converting the reproducing speed to a speed lower than the reproducing speed of the original sound. The method comprises the steps of: detecting the characteristic of an acoustic frame signal contained in the input acoustic signal and having a predetermined length; calculating a speech-speed converting rate from the characteristic of the input acoustic signal, which has been detected in the step of detecting the characteristic; encoding the acoustic frame signal having the predetermined length, thereby to reduce the amount of data; storing the coded data generated in the step of encoding the acoustic frame signal, in an coded data storage section; decoding the coded data read from the coded data storage section and converting speech speed in accordance with the speech-speed converting rate calculated in the step of calculating the speech-speed converting rate, thereby to generate an output acoustic frame signal having a predetermined length.
In this reproducing-speed converting method, too, the signal encoding means interpolates encoding parameters are interpolated in the step of encoding the acoustic signal. The speech speed can therefore be converted in accordance with the speech-speed converting rate calculated in the step of calculating the rate, in the process of decoding the acoustic signal read from the coded data storage section. This apparatus can therefore function as a real-time speech speed converter method that can lower speech speed as much as desired even if the coded data storage means has but a small storage capacity.
The present invention makes it possible to delay the output signal without using an output-data storage section of a large storage capacity even if the input acoustic signal is a multi-channel signal or has a high sampling frequency.