1. Field of the Invention
The present invention relates to a speech synthesizing apparatus and method. More specifically, the present invention relates to a speech synthesizing apparatus and method, wherein speech parameters such as spectrum parameters are interpolated at every frame periods, and then, filtering operation is executed based upon respective interpolated speech parameters to generate a synthesized voice or speech signal.
2. Description of the Prior Art
As a speech parameter for speech synthesizing, a various kinds of parameters such as an LSP (Line Spectrum Pair) parameter, PARCOR (partial autocorrelation) parameter or the like were proposed, as well known. One example of the LSP parameter is indicated in the following table I.
TABLE I ______________________________________ frame C1 C2 C3 C4 C5 . . . C8 A P ______________________________________ t 4206 5854 8221 11204 13261 . . . 28924 25 42 t + 1 5831 5904 7963 10926 14764 . . . 29251 31 41 t + 2 5935 6001 8012 10258 14556 . . . 29541 52 41 ______________________________________
In the example indicated in the table I, the LSP parameter is composed of a primary parameter C1 through 8-nary parameter C8 representing features of the voice or speech at every 12.8 msec(frame period), and a pitch parameter P and an amplitude parameter A as information of a sound source.
On the other hand, in page 142 and thereafter of "Nikkei Electronics 1981/2/2", a synthesizing filter which generates a synthesized voice or speech signal by using such LSP parameters C1-C8 is disclosed. Data which is evaluated by the pitch parameter P and the amplitude parameter A is inputted to the synthesizing filter and the respective parameters C1-C8 is inputted to the same as coefficients. Such a synthesizing filter is normally constructed as a digital filter, and therefore, a filtering operation is executed in accordance with the coefficients in a digital manner.
Furthermore, as well known, in order to improve a quality of the synthesized voice or speech, one frame period is divided into a plurality of subframes and, at every subframe periods, the respective speech parameters are internally interpolated by interpolation pitches.
As a method for interpolating the speech parameters, in the past, two methods were known. A first method is a method wherein an interpolation pitch is evaluated or calculated in advance by dividing a difference between a target value and a present value of the parameter by the number of subframes and, the interpolation pitch is added to the present value at every interpolation timings, that is, subframe periods A second method is a method wherein the interpolation pitch is evaluated or calculated and added to the present value of the parameter at every subframe periods, as disclosed in, for example, Japanese Patent Publication No. 53355/1983 published on Nov. 29, 1983.
In the first method, the interpolation pitch is evaluated or calculated in the following manner. With reference to FIG. 1, on the assumption that the value of the i-nary parameter Ci at the beginning time t of the frame period Tt is Ci.sub.t and the value of the i-nary parameter Ci at the end time t+1 of the frame period Tt is Ci.sub.t+1, in the frame period Tt, the interpolation is performed between the parameters Ci.sub.t and Ci.sub.t+1. In this case, if the number of times of interpolation, that is, the number of the subframes is N, the interpolation pitch is given by the following equation (1). EQU Interpolation pitch=(Ci.sub.t+1 -Ci.sub.t)/N (1)
More specifically, if the number of subframes is "128", in the example of the table I, the interpolation pitch of the parameter C1 becomes "12.7" (=(5831-4206)/128), and the interpolation pitch of the parameter C2 which is adjacent to the parameter C1 becomes "0.4" (=(5904-5854)/128). Therefore, in the frame period Tt, the parameter C1 will be sequentially changed as "4206", "4218.7", "4231.4", "4244.1", "4256.8", ... and the parameter C2 will be sequentially changed as "5854", "5854.4", "5854.8", "5855.2", "5855.6", ...
When operation or calculation precision of the interpolation pitch is insufficient, the interpolation pitch is rounded to one decimal. Therefore, in the above described example, the interpolation pitch of the parameter C1 becomes "13" and the interpolation pitch of the parameter C2 becomes "0". In this case, the parameter C1 will be changed as "4206", "4219", "4232", ..., "5857", "5870", and the parameter C2 remains at "5854", and thus, as seen from FIG. 1, at the end of the frame period Tt, the adjacent two parameters C1 and C2 are abnormally closed or reversed to each other.
If the parameter Ci and the parameter Ci+1 (or Ci-1) being adjacent to the parameter Ci are thus abnormally closed or reversed to each other, oscillation takes place in the synthesizing filter, and therefore, a noise is superposed on the synthesized voice or speech signal being outputted therefrom. Therefore, the quality of the synthesized voice or speech becomes bad.
The following table II indicates another example of the LSP parameter.
TABLE II ______________________________________ Frame C1 C2 C3 C4 C5 . . . C8 A P ______________________________________ t 4715 6115 8209 12156 13905 . . . 27551 31 33 t + 1 4788 4810 7963 11388 14013 . . . 28377 32 34 t + 2 4797 6001 8101 11500 14776 . . . 29115 24 34 ______________________________________
In the example of the table II, when the interpolation pitch is evaluated or calculated in the same manner described above, the interpolation pitch of the parameter C1 becomes "0.6"(=(4788-4715)/128), and the interpolation pitch of the parameter C2 becomes "-10.2" (=4810-6115)/128). Therefore, when the interpolation pitches are rounded to one decimal, respectively, the interpolation pitch of the parameter C1 becomes "1" and the interpolation pitch of the parameter C2 becomes "-10". Therefore, as shown in FIG. 2, the parameter C1 is sequentially changed as "4715", "4716", "4717", ..., "4842", "4843", and the parameter C2 is changed as "6115", "6105", "6095", ... "4845", "4835". Therefore, as shown in FIG. 2, reversal of the adjacent two parameters C1 and C2 occurs at the end of the frame period Tt.
In the case where the operation or calculation precision is insufficient, instead of rounding the interpolation pitch to one decimal as described above, it is possible to omit the figures below the first place of decimals. In this case, in the example of the table II, the interpolation pitch of the parameter C1 becomes "0" and the interpolation pitch of the parameter C2 becomes "-11". Therefore, as shown in FIG. 3, the parameter C1 is not changed and remains at "4715" and, the parameter C2 will be change as "6115", "6104", "6093", ..., "4718", "4707". Therefore, in this case, as seen from FIG. 3, the reversal of the adjacent two parameters C1 and C2 also takes place at the end of the frame period Tt.
Furthermore, if the interpolation pitch is evaluated at every subframe periods in accordance with the second method as described above, such a problem does not take place. However, in the second method, since the calculation of the interpolation pitches and the interpolating process must be executed at every interpolation timings, in order to increase the number of times of the interpolation, that is, the number of subframes, a calculation circuit or a microprocessor capable of processing at high speed is required.