1. Field of the Invention
The present invention relates to a speech decoding device, and particularly to a speech decoding device that can reduce power consumption when generating background noise of unvoiced sections in which speech code is not present.
2. Description of the Related Art
In speech coding devices, the transmission of speech coded information is halted when speech signals to be encoded are not present as a means of reducing power consumption. In such cases, there occurs a conspicuous degree of noncontinuity between voiced and unvoiced portions in the decoded speech signals decoded in speech decoding devices on the receiving side, and in order to solve this problem, background noise signals are artificially generated and outputted.
The configuration and operation of a background noise generation method of a speech decoding device of the prior art is described in detail in, for example, Japanese Patent Laid-open No. 122165/93.
In addition, details regarding encoding processes and decoding processes of speech signals in speech coding devices and speech decoding devices of the prior art are provided in, for example, Chapter 5.2.1 (Speech Coding Processing) and Chapter 5.2.4 (Speech Decoding Processing) of the Digital Automobile Telephone System Standards RCR STD-27C, Volume I (Research & Development Center for Radio Systems, Nov. 10, 1994).
Here, a brief explanation will be presented of the configuration of the background noise generation system of a prior-art speech decoding device with reference to FIG. 1.
FIG. 1 is a block diagram showing the configuration of a background noise generation system of the prior art. Referring to the figure, the prior-art background noise generation system is composed of input terminal 51 for inputting received information, received information memory 52 for storing received information, code generator 53 for generating code used in the decoding process, decode processor 54 for decoding code, and output terminal 55 for outputting output signals.
Sections, in which speech signals to be coded on the transmission side are present, are hereinbelow referred to as "voiced," and sections, in which speech signals to be coded are not present, are referred to as "unvoiced." In addition, code in which speech signals have been encoded on the encoding side is referred to simply as "code."
Received information memory 52 is provided with received code storage section 521 and voiced/unvoiced information storage section 522. Received code storage section 521 inputs received code from input terminal 51 and stores the code. Voiced/unvoiced information storage section 522 inputs information indicating whether the current section is voiced or unvoiced (hereinbelow referred to as "voiced/unvoiced information") and stores the information.
Code generator 53 is provided with background noise code generator 531, code controller c531, and code switch s531. Based on voiced/unvoiced information inputted from voiced/unvoiced information storage section 522, code controller c531 controls the operation of background noise code generator 531 and code switch s531 as follows:
During a voiced section, received code stored in received code storage section 521 is outputted, without change, to decoding processor 54. During an unvoiced section, background noise code generator 531 is activated, whereby code for background noise generation is generated from the code inputted from received code storage section 521, and is outputted to decoding processor 54.
Decoding processor 54 is provided with excited signal generator 541, synthesized signal generator 542, and postfilter section 543.
Code inputted from code generator 53 is transferred to excited signal generator 541, synthesized signal generator 542, and postfilter section 543.
Excited signal generator 541 generates and outputs excited signals from code inputted from code generator 53.
Synthesized signal generator 542 passes the inputted excited signals through a synthesizing filter to generate and output synthesized signals.
Postfilter section 543 passes synthesized signals generated at synthesized signal generator 542 through a postfilter to generate postfilter output signals, and outputs the signals from output terminal 55.
The postfilter section suppresses noise contained in the synthesized speech signals, and has the effect of improving the subjective quality of speech signals in voiced sections.
Next, referring to FIGS. 1 and 2, an explanation will be given regarding the operation of the background noise generation system of a prior-art speech decoding device.
Received code inputted from input terminal 51 is stored in received code storage section 521. In concrete terms, code is stored that indicates, for example, speech spectral envelope information, speech signal level, pitch information, and noise information. Voiced/unvoiced information inputted from input terminal 51 is stored in voiced/unvoiced information storage section 522.
Based on voiced/unvoiced information inputted from voiced/unvoiced information storage section 522, code controller c531 controls the operation of background noise code generator 531 and code switch s531 as follows (Step B1):
During a voiced section, received code stored in received code storage section 521 is outputted without alteration to decoding processor 54, and in addition, the received code is also outputted to background noise code generator 531. This process is executed because background noise code generator 531 generates code for background noise generation based on received code during voiced sections. The received code is actually code indicating, for example, speech spectral envelope information, speech signal level, pitch information, and noise information.
During an unvoiced section, code controller c531 activates background noise code generator 531. Background noise code generator 531 generates code for background noise generation from the most recently received code of the received code inputted from received code storage section 521, and outputs to decoding processor 54 (Step B2). The actual methods employed to generate code for background noise generation include, for example, level attenuation of speech signals and randomization of noise information.
Of the code inputted from code generator 53, excited signal generator 541 generates excited signals from code indicating, for example, pitch information and noise information, and outputs the result (Step B3).
One example of a method actually employed for the generation of excited signals can be described as follows: Excited signal generator 541 holds, in advance, pitch component signals and noise component signals as data bases for each of the codes indicating pitch information and noise information, and upon inputting code indicating pitch information and noise information from code generator 53, selects from each data base the pitch component signals and noise component signals that correspond to each code. The selected pitch component signals and noise component signals are added and excited signals are generated. For example, if the code indicating pitch information is L, the selected pitch component signal corresponding to code L is b.sub.L (n), the code representing noise information is I, and the selected noise component signal corresponding to code I is u.sub.I (n), the excited signal ex(n) can be calculated according to the following equation: EQU ex(n)=b.sub.L (n)+u.sub.I (n) (1)
Of the code inputted from code generator 53, synthesized signal generator 542 forms a synthesizing filter from code indicating spectral envelope information. Synthesized signals are generated and outputted by passing excited signals inputted from excited signal generator 541 through the synthesizing filter (Step B4). An actual example of a synthesizing filter generation method employed can be explained as follows. If linear predictive code indicating spectral envelope is represented by .alpha..sub.i, the transfer function A(z) of the synthesizing filter in synthesized signal generator 542 can be represented by the following equation: ##EQU1##
However, N.sub.P is a degree (for example, the tenth degree) of linear predictive code .alpha..sub.i.
Of the code inputted from code generator 53, postfilter section 543 forms a postfilter from code indicating spectral envelope information of speech signals and pitch information, and generates postfilter output signals by passing synthesized signals outputted from synthesized signals generator 542 through the postfilter, and outputs the signals from output terminal 55 (Step B5).
An actual example of a postfilter generation method can be described as follows. One proposed form of the construction of a postfilter for improving subjective quality of synthesized speech signals in a voiced section is a connection in series of a pitch enhancement filter that enhances the pitch component of synthesized speech signals, a high-frequency enhancement filter that enhances the high-frequency component, and a spectral shaping filter that enhances the spectral envelope.
As an example of the transfer function P(z) of a pitch enhancement filter that enhances the pitch component, the following equation can be proposed: ##EQU2##
Here, "lag" is the pitch cycle value of excited signals (for example, 20.about.146). Constant g.sub.c is a weighting coefficient (for example, 0.7).
As an example of the transfer function B(z) of a high-frequency enhancement filter that enhances the high-frequency component, the following equation can be proposed: EQU B(z)=1-g.sub.b .multidot.z.sup.-1 ( 4)
Here, constant g.sub.b is a weighting coefficient (for example 0.4).
The following equation gives one possible form for a transfer function H(z) of the spectral shaping filter that enhances the spectral envelope: ##EQU3##
Here, N.sub.P is a degree of linear predictive parameter .alpha. (for example, the tenth degree). In addition, constants g.sub.n.sup.i and g.sub.d.sup.i are weighting coefficients (for example, g.sub.n.sup.i =0.5 and g.sub.d.sup.i =0.8).
However, the above-described speech decoding device of the prior art has the following problems:
If postfilter processing is incorporated into the speech decoding process, the filtering process of the postfilter requires a massive number of sum-of-product calculations, and this results in increased power consumption.
If, on the other hand, the postfilter is not activated during unvoiced sections as a means of reducing power consumption, the internal state of the postfilter is not updated during intervals when operation of the postfilter is halted, with the consequent drawback of degradation of synthesized speech signals for a voiced section that immediately follows the change from an unvoiced to voiced section. Moreover, there is the additional problem that perception of noncontinuity occurs in reproduced signals between voiced and unvoiced sections due to switching between activation and deactivation of the postfilter upon changes between voiced and unvoiced sections.