1. Field of the Invention
The present invention relates to an information processing apparatus and method, and to a recording medium therefor. More particularly, the present invention relates to an information processing apparatus and method capable of improving the accuracy of an excitation source in the band spreading of a speech signal, obtaining a wide-band signal having no gaps, and reducing the amount of computation thereof, and to a recording medium therefor.
2. Description of the Related Art
Speech signal transmission technology is becoming prevalent. Speech signal transmission technology is applied to portable telephones, wired telephones, voice recorders, etc. Conventionally, a narrow-band signal of 300 Hz to 3400 Hz is used for transmitting and receiving this speech signal. However, since the frequency band is narrow, there is a problem in that the sound quality is poor. Therefore, in order to overcome this problem, a technique has been developed in which a narrow-band signal is used at the transmission side or in a transmission line, and the receiving side performs a band-spreading process on the received narrow-band signal so that the signal is converted into a wide-band signal.
FIG. 1 is a block diagram showing the construction of a conventional band-spreading apparatus for converting a narrow-band speech signal into a wide-band speech signal.
An xcex1 band-widening section 1 causes a prediction coefficient xcex1N representing a narrow-band spectrum envelope of a narrow-band speech signal sndN to represent a wider band, and outputs it as a prediction coefficient xcex1W representing a wide-band spectrum envelope to a wide-band LPC (Linear Predictive Code) combining section 4. The details of this method of determining the prediction coefficient xcex1W from the prediction coefficient xcex1N is disclosed in, for example, Japanese Unexamined Patent Application Publication No. 11-126098.
An adder 2 adds together an adaptive signal (signal containing pitch components) excPN and a noise signal excNN corresponding to the narrow-band speech signal sndN, and outputs the sum, as an excitation source excN for a narrow-band speech signal, to an exc band-widening section 3. The adaptive signal excPN and the noise signal excNN correspond to an output from an adaptive code book and an output from a noise code book, respectively, when a coding apparatus employing a CELP (Code Excited Linear Prediction) method is used for each of them.
The exc band-widening section 3 performs band-widening on the excitation source excN for the input narrow-band speech signal, converts it into an excitation source excW for wide-band speech signal, and outputs it to the wide-band LPC combining section 4. Specifically, based on the characteristics that the excitation source is almost white noise, aliasing is generated by inserting a zero value between adjacent samples, and the excitation source excW for a wide-band speech signal is generated. The details of this method of determining the excitation source excW for a wide-band speech signal from the excitation source excN for a narrow-band speech signal are also disclosed in, for example, Japanese Unexamined Patent Application Publication No. 11-126098 described above.
The wide-band LPC combining section 4 filter-synthesizes the excitation source excW input from the exc band-widening section 3 by using the prediction coefficient xcex1W input from the xcex1 band-widening section 1 as a filtering coefficient, converts it into a first wide-band speech signal, and outputs it to a band suppression section 5.
The band suppression section 5 suppresses only the frequency band contained in the narrow-band speech signal within the input first wide-band speech signal, generates a second wide-band speech signal, and outputs it to an adder 7. That is, since distortion is contained in the first wide-band speech signal, the frequency band of the narrow-band speech signal is replaced with a narrow-band speech signal input from an oversampling apparatus 6. As a result, distortion of an amount corresponding to the frequency band contained in the original narrow-band speech signal is reduced.
The oversampling apparatus 6 oversamples the input narrow-band speech signal sndN at the sampling frequency of the wide-band speech signal, causes the sampling frequency to coincide with the sampling frequency of the wide-band speech signal, and outputs it to the adder 7.
The adder 7 adds together the second wide-band speech signal input from the band suppression section 5 and the signal input from the oversampling apparatus 6, thereby generating a final wide-band speech signal sndW, and outputting this signal.
Not all of the prediction coefficient xcex1N, the adaptive signal excPN, the noise signal excNN, and the narrow-band speech signal sndN are independent. The prediction coefficient xcex1N can be determined by performing linear prediction analysis on the narrow-band speech signal sndN, and the adaptive signal excPN and the noise signal excNN can be determined by performing pitch analysis thereon. The noise signal excNN is a long-term predictive residual, and the sum of the adaptive signal excPN and the noise signal excNN becomes a linear predictive residual. Furthermore, the narrow-band speech signal sndN can be determined by performing filter synthesis on the basis of the prediction coefficient xcex1N, and the sum of the adaptive signal excPN and the noise signal excNN. In addition, the prediction coefficient xcex1N, the adaptive signal excPN, and the noise signal excNN can also be determined by preprocessing the narrow-band speech signal sndN and can also be determined on the basis of a quantized signal.
Next, a description is given of the operation when a conventional band-spreading apparatus converts the input narrow-band speech signal sndN into a wide-band speech signal sndW.
The a band-widening section 1 causes the prediction coefficient xcex1N of the input narrow-band speech signal to represent a wider band, and outputs it as a prediction coefficient xcex1W of the wide-band speech signal to the wide-band LPC combining section 4.
The adder 2 adds together the input adaptive signal excPN and the noise signal excNN, and outputs an excitation source excN for the narrow-band speech signal to the exc band-widening section 3. The exc band-widening section 3 performs band-widening on the excitation source excN for the input narrow-band speech signal, and outputs it as an excitation source excW for the wide-band speech signal to the wide-band LPC combining section 4.
The wide-band LPC combining section 4 performs a filtering process on the excitation source excW for the wide-band speech signal on the basis of the prediction coefficient xcex1W of the input wide-band speech signal, generates a first wide-band speech signal, and outputs it to the band suppression section 5. The band suppression section 5 suppresses the frequency band contained in the narrow-band speech signal within the input first wide-band speech signal, generates a second wide-band speech signal, and outputs it to the adder 7.
The oversampling apparatus 6 oversamples the input narrow-band speech signal sndN at the sampling frequency of the wide-band speech signal, and outputs it to the adder 7.
The adder 7 adds together the second wide-band speech signal input from the band suppression section 5 and the oversampled signal input from the oversampling apparatus 6, generates a final wide-band speech signal sndW, and outputs it.
The band suppression section 5 may be a high-pass filter which, instead of strictly suppressing only the frequency band of the narrow-band speech signal, for example, suppresses only a low-frequency band, and also, the band suppression section 5 may multiply a gain factor or may perform a filtering process.
However, in the above-described method, originally, since the excitation source formed of the linear sum of an adaptive signal and a noise signal is band-widened by inserting zero values, there is a problem in that its accuracy is not high.
Also, for example, in a case where the sampling frequency is limited to 8 kHz, the sampling frequency of the wide-band signal is limited to 16 kHz, and the frequency of the narrow-band excitation source is limited to 300 to 3400 Hz, in the above-described method, the frequency band of the wide-band excitation source to be obtained becomes 300 to 3400 Hz and 4600 to 7700 Hz, and the intermediate frequency band of 3400 Hz to 4600 Hz which is between them is not generated (a gap occurs). For this reason, in this wide-band excitation source, even if wide-band LPC combining is performed, the intermediate frequency band of 3400 Hz to 4600 Hz is not generated, and there is a problem in that the wide-band speech signal becomes unnatural.
The present invention has been achieved in view of such circumstances. The present invention aims to improve the accuracy of an excitation source in band spreading of a speech signal and to obtain a wide-band signal having no gaps.
To achieve the above-mentioned object, according to a first aspect of the present invention, there is provided an information processing apparatus comprising first generation means for generating a second adaptive signal from a first adaptive signal of a narrow-band signal; second generation means for generating a second noise signal from a first noise signal of the narrow-band signal; and third generation means for generating an excitation source for a wide-band signal by combining the second adaptive signal generated by the first generation means and the second noise signal generated by the second generation means.
The first adaptive signal and the second adaptive signal may contain pitch components.
The first generation means may generate the second adaptive signal by performing band-widening on the first adaptive signal.
The first generation means may generate the second adaptive signal by interpolating the first adaptive signal.
The first generation means may generate the second adaptive signal by interpolating the first adaptive signal and by suppressing one or plural sample data before and after the sample data of the first adaptive signal which reaches a peak value.
The first generation means may generate the second adaptive signal by interpolating the first adaptive signal and by suppressing sample data of the first adaptive signal having a value equal to or greater than a predetermined value or by suppressing sample data whose absolute value is equal to or greater than a predetermined value.
The second generation means may generate the second noise signal by performing band-widening on the first noise signal.
The second generation means may generate the second noise signal by adding to the first noise signal a noise signal having components which are not contained in the first noise signal.
The second generation means may generate the second noise signal by adding to the second noise signal formed by band-widening the first noise a noise signal having components of a frequency band which is not contained therein.
According to a second aspect of the present invention, there is provided an information processing method comprising a first generation step of generating a second adaptive signal from a first adaptive signal of a narrow-band signal; a second generation step of generating a second noise signal from a first noise signal of the narrow-band signal; and a third generation step of generating an excitation source for a wide-band signal by combining the second adaptive signal generated in the first generation step and the second noise signal generated in the second generation step.
According to a third aspect of the present invention, there is provided a program of a recording medium, comprising a first generation step of generating a second adaptive signal from a first adaptive signal of a narrow-band signal; a second generation step of generating a second noise signal from a first noise signal of the narrow-band signal; and a third generation step of generating an excitation source for a wide-band signal by combining the second adaptive signal generated in a process of the first generation step and the second noise signal generated in a process of the second generation step.
According to a fourth aspect of the present invention, there is provided an information processing apparatus comprising first generation means for generating a second noise signal from a first noise signal of a narrow-band signal; and second generation means for directly generating an excitation source for a wide-band signal, from the second noise signal generated by the first generation means.
The first generation means may generate the second noise signal by adding to the first noise signal a noise signal having components which are not contained in the first noise signal.
The first generation means may generate the second noise signal by adding to the second noise signal formed by band-widening the first noise signal a noise signal having components of a frequency band which is not contained therein.
According to a fifth aspect of the present invention, there is provided an information processing method comprising a first generation step of generating a second noise signal from a first noise signal of a narrow-band signal; and a second generation step of directly generating an excitation source for a wide-band signal, from the second noise signal generated in a process of the first generation step.
According to a sixth aspect of the present invention, there is provided a program of a recording medium, comprising a first generation step of generating a second noise signal from a first noise signal of a narrow-band signal; and a second generation step of directly generating an excitation source for a wide-band signal, from the second noise signal generated in a process of the first generation step.
According to a seventh aspect of the present invention, there is provided an information processing apparatus comprising first extraction means for extracting a short-term predictive residual signal on the basis of the analysis result of a narrow-band signal; second extraction means for extracting a first adaptive signal and a first noise signal by performing long-term prediction on the basis of the short-term predictive residual signal extracted by the first extraction means; first generation means for generating a second adaptive signal from the first adaptive signal extracted by the second extraction means; second generation means for generating a second noise signal from the first noise signal extracted by the second extraction means; and third generation means for generating an excitation source for a wide-band signal by combining the second adaptive signal generated by the first generation means and the second noise signal generated by the second generation means.
The first adaptive signal and the second adaptive signal may contain pitch components.
The first generation means may generate the second adaptive signal by performing band-widening on the first adaptive signal.
The first generation means may generate the second adaptive signal by interpolating the first adaptive signal.
The first generation means may generate the second adaptive signal by interpolating the first adaptive signal and by suppressing one or plural sample data before or after sample data of the first adaptive signal which reaches a peak value.
The first generation means may generate the second adaptive signal by interpolating the first adaptive signal and by suppressing sample data of the first adaptive signal having a value equal to or greater than a predetermined value or by suppressing sample data whose absolute value is equal to or greater than a predetermined value.
The second generation means may generate the second noise signal by performing band-widening on the first noise signal.
The second generation means may generate the second noise signal by adding to the first noise signal a noise signal having components which are not contained in the first noise signal.
The second generation means may generate the second noise signal by adding to a noise signal formed by band-widening the first noise signal a noise signal having components of a frequency band, which are not contained therein.
According to an eighth aspect of the present invention, there is provided an information processing method comprising a first extraction step of extracting a short-term predictive residual signal on the basis of the analysis result of a narrow-band signal; a second extraction step of extracting a first adaptive signal and a first noise signal by performing long-term prediction on the basis of the short-term predictive residual signal extracted in a process of the first extraction step; a first generation step of generating a second adaptive signal from the first adaptive signal extracted in a process of the second extraction step; a second generation step of generating a second noise signal from the first noise signal extracted in a process of the second extraction step; and a third generation step of generating an excitation source for a wide-band signal by combining the second adaptive signal generated in a process of the first generation step and the second noise signal generated in a process of the second generation step.
According to a ninth aspect of the present invention, there is provided a program of a recording medium, comprising a first extraction step of extracting a short-term predictive residual signal on the basis of the analysis result of a narrow-band signal; a second extraction step of extracting a first adaptive signal and a first noise signal by performing long-term prediction on the basis of the short-term predictive residual signal extracted in a process of the first extraction step; a first generation step of generating a second adaptive signal from the first adaptive signal extracted in a process of the second extraction step; a second generation step of generating a second noise signal from the first noise signal extracted in a process of the second extraction step; and a third generation step of generating an excitation source for a wide-band signal by combining the second adaptive signal generated in a process of the first generation step and the second noise signal generated in a process of the second generation step.
According to a tenth aspect of the present invention, there is provided an information processing apparatus comprising first extraction means for extracting a short-term predictive residual signal on the basis of the analysis result of a narrow-band signal; second extraction means for extracting a first noise signal by performing long-term prediction on the basis of the short-term predictive residual signal extracted by the first extraction means; first generation means for generating a second noise signal from the first noise signal extracted by the second extraction means; and second generation means for directly generating an excitation source for a wide-band signal from the second noise signal generated by the first generation means.
The first generation means may generate the second noise signal by adding to the first noise signal a noise signal having components of a frequency band which is not contained in the first noise signal.
The first generation means may generate the second noise signal by adding to a noise signal of the wide-band signal formed by band-widening the first noise signal a noise signal having components of a frequency band which is not contained therein.
According to an eleventh aspect of the present invention, there is provided an information processing method comprising a first extraction step of extracting a short-term predictive residual signal on the basis of the analysis result of a narrow-band signal; a second extraction step of extracting a first noise signal by performing long-term prediction on the basis of the short-term predictive residual signal extracted in a process of the first extraction step; a first generation step of generating a second noise signal from the first noise signal extracted in a process of the second extraction step; and a second generation step of directly generating an excitation source for a wide-band signal on the basis of the second noise signal generated in a process of the first generation step.
According to a twelfth aspect of the present invention, there is provided a program of a recording medium, comprising a first extraction step of extracting a short-term predictive residual signal on the basis of the analysis result of a narrow-band signal; a second extraction step of extracting a first noise signal by performing long-term prediction on the basis of the short-term predictive residual signal extracted in a process of the first extraction step; a first generation step of generating a second noise signal from the first noise signal extracted in a process of the second extraction step; and a second generation step of directly generating an excitation source for a wide-band signal on the basis of the second noise signal generated in a process of the first generation step.
In the information processing apparatus, the information processing method, and the recording medium in accordance with the present invention, a second adaptive signal is generated from a first adaptive signal of a narrow-band signal, a second noise signal is generated from a first noise signal of the narrow-band signal, the generated second adaptive signal and the generated second noise signal are combined, and an excitation source for a wide-band signal is generated.
In the information processing apparatus, the information processing method, and the recording medium in accordance with the present invention, a second noise signal is generated from a first noise signal of a narrow-band signal, and an excitation source for a wide-band signal is generated directly from the generated second noise signal.
In the information processing apparatus, the information processing method, and the recording medium in accordance with the present invention, a short-term predictive residual signal is extracted from the analysis result of a narrow-band signal, long-term prediction is performed on the basis of the extracted short-term predictive residual signal, the first adaptive signal and the first noise signal are extracted, a second adaptive signal is generated from the extracted first adaptive signal, a second noise signal is generated from the extracted first noise signal, the generated second adaptive signal and the generated second noise signal are combined, and an excitation source for a wide-band signal is generated.
In the information processing apparatus, the information processing method, and the recording medium in accordance with the present invention, a short-term predictive residual signal is extracted from the analysis result of a narrow-band signal, long-term prediction is performed on the basis of the extracted short-term predictive residual signal, a first noise signal is extracted, a second noise signal is generated from the extracted first noise signal, and an excitation source for a wide-band signal is produced directly from the generated second noise signal.