1. Field of the Invention
This invention relates to such an efficient speech coding method to divide an input speech signal rate units of blocks to carry out coding processing with divided blocks being as a unit.
2. Description of the Related Art
There have been known various coding methods adapted to carry out signal compression by making use of the statistical property in the time region and the frequency region of an audio signal (including speech (voice) signal or acoustic signal) and the characteristic from a viewpoint of hearing of the human being. The coding method of this kind is further roughly classified into coding in the time region, coding in the frequency region, and analysis/synthesis coding, etc.
As an example of efficient coding of speech signal, etc., there are MBE (Multiband Excitation) coding, SBE (Singleband Excitation) coding, Harmonic coding, SBC (Sub-Band Coding), LPC (Linear Predictive Coding), DCT (Discrete Cosine Transform), MDCT (Modified DCT), or FFT (Fast Fourier Transform), etc. In such efficient coding processing, in the case of quantizing various information data such as spectrum amplitude or their parameters (LSP parameter, .alpha. parameter, k parameter, etc.) there are many cases where scalar quantization is conventionally carried out.
In the speech (voice) analysis/synthesis system such as PARCOR method, etc., since timing for switching excitation source is given every block (frame) on the time base, voiced sound and unvoiced sound cannot be mixed within the same frame. As a result, high quality speech (voice) could not be obtained.
On the contrary, in the above-mentioned MBE coding, since voiced sound/unvoiced sound discriminations (V/UV discrimination) are carried out on the basis of spectrum shape in bands every respective bands (frequency bands) obtained by combining respective harmonics of the frequency spectrum or 2.about.3 harmonics thereof, or every bands divided by fixed frequency band width (e.g., 300.about.400 Hz) with respect to speech signals (signal components) within one block (frame), improvement in the sound quality is concluded. Such V/UV discriminations for each of the respective bands are carried out chiefly in dependency upon the degree of existence (occurrence) of harmonics in the spectra within those bands.
However, if, e.g., the pitch suddenly changes within one block (e.g., 256 samples), a so called "indistinctness" (obscurity) may take place particularly in the medium.about.high frequency band as shown in FIG. 1, for example, in that spectrum structure. Moreover, as shown in FIG. 2, there are instances where harmonics do not necessarily exist at frequencies which are an integer multiple of the fundamental period, or there are instances where detention accuracy of the pitch is insufficient. Under such circumstances, when V/UV discriminations for all the respective bands are carried out in accordance with the conventional system, any inconvenience takes place in spectrum matching in V/UV discrimination, i.e., matching between the currently inputted signal spectrum and the spectrum which has been synthesized up to that time for every each band or each harmonic. As a result, bands or harmonics which should be discriminated to be primarily discriminated as V (Voiced Sound) may be erroneously discriminated to be UV (Unvoiced Sound). Namely, in the case shown in FIG. 1 or 2, speech signal components only on a lower frequency side are judged to be V (Voiced Sound) and speech signal components in the medium.about.higher frequency band are judged to be UV (Unvoiced Sound). As a result, synthetic sound may be so called easy.
In addition, also in the case where Voiced Sound/Unvoiced Sound discrimination (V/UV discrimination) is implemented to the entirety of signals (signal components) within the block, similar inconvenience may take place.