The present invention relates to an voice encoder used in voice digital wired communication and radio communication, in particular to a method for improving an voice encoder for transmitting non-voice signal using an voice frequency band, such as dual tone multi frequency, DTMF, signals and push button, PB, signals.
The reduction of communication cost is the most important issue in the private (i.e., local) network. In order to achieve highly efficient transmission of voice signals occupying most part of communication traffics, cases where highly efficient voice encoder based on voice encoding and decoding is applied is increasing, which is exemplified by 8 kbit/s conjugate-structure algebraic-code-excited linear prediction, CS-ACELP, voice encoding method (ITU-T Recommendation G, 729 compliant).
The voice encoding algorithm where the transmission speed is 8 kbit/s has a structure where input signals are specific to voice signals in order to obtain high quality voice with less information amount. This will be described with reference to the 8 kbit/s CS-ACELP system. FIG. 9 shows a schematic block diagram of an encoder, and FIG. 10 shows a detail block diagram of the encoder.
This encoding method has an encoding algorithm where the human vocalizing mechanism is modeled. In other words, it is based on CELP method, which uses a composite filter 6 (linear filter corresponding to a voice spectral envelope) where human vocal tract information is modeled to drive time series signals (outputs of an adder 15) stored in a code book corresponding to the human vocal cords information.
The detailed description of the algorithm can be found in ITU-T Recommendation G. 729, xe2x80x9cCoding of Speech at 8 kbit/s using Conjugate-Structure Algebraic-Code Excited Linear Prediction (CS-ACELP)xe2x80x9d.
In the coding algorithm specific to voices, higher efficient transmission tends to deteriorate transmission characteristics of signals (such as DTMF signals, PB signals, No. 5 signaling, modem signals) other than voice: signals using the voice frequency band in a transmission path using the highly efficient voice encoder.
Of one example showing the condition, details of LSP quantizer portion will be described with reference to FIG. 11. FIG. 11 shows an LSP quantifier portion (309) within an encoder based on the CS-ACELP method shown in FIG. 9. FIG. 11 includes an MA prediction component calculator 308 for calculating Moving Average (MA) of an LSP, a multiplier 330, adders 331, 332, and 333, a quantized error weighting coefficient calculator portion 338 for calculating a weighting coefficient based on an input LSP coefficient, a least square error calculator 334 for calculating a square error between a quantized LSP vector calculated in the adder 332 and an LSP vector calculated based on an input voice signals and multiplying it by the weighting coefficient calculated in 334 to select a least square error among quantifier LSP vector candidates, the first stage LSP codebook 335, the second stage LSP codebook 336, and an MA prediction coefficient codebook 337 where a plurality kinds of sets of MA coefficients.
Since the LSP quantization method using this structure is described in detail in xe2x80x9cCS-ACELP no LSP to gain no ryoushikahouxe2x80x9d, Kataoka et al., NTT RandD, Vol. 45 No. 4, 1996, pp. 331-336. Thus, the description is omitted here. It is known that the LSP quantization method is used so that voice signal spectral envelop information can be quantized efficiently.
According to the CS-ACELP voice coding method, the quantization of LSP coefficients is achieved by following three processes. That is, the LSP quantizer portion 309 has three processing function blocks as shown below:
(1) an MA (Moving Average) prediction component calculator portion 308 for subtracting a predictable component between frames in order to achieve efficient quantization;
(2) the first stage LSP quantization code book 335 for using an adaptive code book learned from voices to achieve rough quantization; and
(3) the second stage LSP quantization code note 336 for finely adjusting random number series for an target LSP, which is quantized roughly in the first stage.
The MA (Moving Average) in (1) is used so that signals with few radical changes in frequency characteristics, that is, having strong correlation between frames can be quantized efficiently. Further, the adaptive code book of (2) is used so that a schematic form of a spectral envelope specific to audio signals can be expressed efficiently with a few information amounts. Furthermore, when the random code book of (3) is used in addition to the learned code book of (2) so that slight changes in spectral envelop can be followed flexibly. In consideration of the above-described reasons, it can be said that the LSP quantifier portion 309 is a well suitable method for coding characteristics of voice spectral envelope information efficiently. On the other hand, in order to code non-voice signals, especially DTMF signals, characteristics as described below must be considered:
Voice signals and DTMF signals differ significantly in spectral envelope;
radical changes in spectral characteristics are found between a signal continue time and a pause time. Gains also changes radically. However, a change amount in spectral characteristics and gains only for the duration of DTMF signals is extremely small;
Since quantization distortion of LSP is reflected on frequency distortion of DTMF as it is, the LSP quantization distortion must be small as much as possible; and
For the duration of the DTMF signals, the frequency characteristic is extremely stable.
In consideration of the above-described viewpoints, it cannot be said that the LSP quantizer portion 301 is an effective method for coding the spectral envelope of DTMF signals.
As described in the example above, the non-voice signals such as DTMF signals have different characteristics from those of voice signals in several viewpoints. Thus, when the non-voice signals are coded, it is not suitable to use a same method as one used for voice signals under the condition where the transmission bit rate is low and redundancy for coding is small.
By the way, in the private network, for the call set-up in the telephone communication, the in-channel signalling is performed by using DTMF signals instead of the common channel signalling. In this case, if an allocated transmission path uses the voice coding, it deteriorates transmission characteristics of the DTMF signals. As a result, the call set-up frequently cannot be achieved normally.
As the first solution for overcoming the problem, a device configuration in FIG. 12 as disclosed in Japanese Unexamined Patent Application Publication No. 9-81199 may be adopted. This configuration includes a unit for, identifying a voice signal and a non-voice signal such as a DTMF signal on the transmission side and memories for storing patterns in which the DTMF signal is pre-decoded on the transmission side and a receiver side. When the identification unit identified a DTMF signal input, an index of a memory holding the coded patterns corresponding to a number of :DTMF to the receiver side, where the index was identified to generate a DTMF signal corresponding to the digit.
As the second solution for overcoming the problem, a device configuration in FIG. 13 may be adopted, for example. An encoder 101 includes one which is optimized for coding voice signals and one which is optimized for compressively coding non-voice signals (such as DTMF signal) with less distortion. The configuration includes a unit for identifying whether a signal to be transmitted is voice or non-voice and selecting one of the function blocks based on the determination result from the identification unit for coding processing. Further, the configuration includes a unit for folding the determination result into an encoder output so that transmission can be achieved without changing its transmission bit-rate and with least deterioration in voice quality. Furthermore, a searching unit corresponding to the encoder 101 is provided on a side of a decoder 201 also.
Next, operations of the voice coding and decoders will be described. On the transmission side in FIG. 13, a voice/non-voice signal identification unit 102 always monitors whether an input signal is a voice signal or a non-voice signal and determines an operation mode of the encoder 101 based on the determination result. If the voice/non-voice signal identification unit 102 determines it as xe2x80x9cvoicexe2x80x9d, switches 103 and 14 are turned to 103A and 104A sides, respectively. As a result, within the encoder 101, a coding processing function block 105 is selected so as to achieve an operation mode suitable for coding a voice signal efficiently (xe2x80x9cvoice modexe2x80x9d hereinafter).
Under this mode, the encoder 101 performs coding processing on the voice signal based on a coding algorithm and outputs a code corresponding to the input voice. On the other hand, if the voice/non-voice signal identification unit 102 determines it as xe2x80x9cnon-voicexe2x80x9d, the switches 103 and 14 are turned to 103B and 104B sides, respectively. As a result, within the encoder 101, the coding processing function block 106 is selected so as to achieve an operation mode suitable for compressively coding the non-voice signal, such as DTMF signal, with less distortion (xe2x80x9cnon-voice modexe2x80x9d hereinafter).
Under this mode, the encoder 101 performs coding processing on the non-voice signal, such as DTMF signal, based on a coding algorithm and outputs a code corresponding to the input non-voice signal. Further, a multiplexer portion 107 multiplexes one obtained by coding a voice signal or a non-voice signal (voice/non-voice code, hereinafter) and a result from identification of an input signal(voice signal or non-voice signal), which is output from the voice/non-voice signal identification unit 102 and send it to the transmission path.
On the receiver side in FIG. 13, first of all, the bit sequence received from the transmission path is separated into voice/non-voice codes and a determination result of the voice/non-voice identification unit 102 in a demultiplexer portion 202. If the determination results by the voice/non-voice signal identification unit 102 taken out of the signal array is xe2x80x9cvoicexe2x80x9d, switches 203 and 204 are turned to 203A and 204A sides, respectively. As a result, within the decoder 201, a decoding processing function block 205 is selected, which achieves in the decoder an operation mode corresponding to the voice mode in the encoder 101. Under this mode, the decoder 201 performs decoding processing based on a decoding algorithm in order to decode the voice signals. Here, since both encoding and decoding are performed under the voice mode, the decoded voice signals have quality in accordance with the original performance the coding algorithm has.
If the determination results by the voice/non-voice signal identification unit 102 taken out of the signal array in the demultiplexer portion 202 is xe2x80x9cnon-voicexe2x80x9d, switches 203 and 204 are turned to 203B and 204B sides, respectively. As a result, within the decoder 201, a decoding processing function block 206 is selected, which achieves in the decoder an operation mode corresponding to the non-voice mode in the encoder 101. Under this mode, the decoder 201 performs decoding processing based on a decoding algorithm in order to decode the non-voice signals. Here, since both encoding and decoding are performed under the non-voice mode, the decoded non-voice signals have even less distortion than those performed under the voice mode.
According to the conventional embodiment as described above, coding and decoding are performed based on;a method using a general voice coding/decoding algorithms more suitable for coding voices during voice signal transmission, or by switching a part of processing function block into a method more suitable for coding non-voice signals during non-voice signal, especially DTMF signal, transmission. Thus, during the non-voice signal transmission, high quality non-voice signals can be transmitted without an increase in transmission bit rate.
When a voice communication system based on the conventional example is established, it is required to have the non-voice mode in both encoder 101 and decoder 201. Only improvement in the transmission side (coding side) cannot allow the decoder side to address the non-voice mode. As a result, nothing can decode normal voice signals, which may cause undesirable phenomena for a caller, such occurrence of noises.
By the way, when enterprise communication system is established, for example, voice transmission equipments cannot be replaced concurrently on sender and receiver sides for various reasons. For example, it is assumed that a voice transmission device (such as multimedia multiplexer equipments) are established which have voice encoder and decode based on the CS-ACELP method conforming to conventional ITU-T Recommendation G. 729. In this case, even if the voice transmission device addressing the non-voice mode is replaced only the transmission side for the purpose of achieving DTMF in-channel transmission, mutual connections cannot be performed because the voice transmission device on the receiver side is still a conventional decoder. Thus, the voice transmission device must be replaced. However, it requires further expensive investment for the user of the voice transmission device, which makes the replacement difficult.
The present invention was made for overcoming the conventional problems. It is an object of the present invention to provide a voice encoder for the improvement of transmission characteristics of non-voice signals such as DTMF signals by permitting mutual connections with a conventional decode and in-channel transmission of non-voice signals such as DTMF signals when highly efficient voice encoder and decoder are provided which maintain voice transmission quality a given coding algorithm originally has.
A voice encoding device according to the present invention includes an encoder having a first quantizing block suitable for voice encoding and a second quantizing block suitable for non-voice encoding and compressively encoding input signals, a voice/non-voice signal identification unit for identifying whether an input signal to the encoder is a voice signal or a non-voice signal and outputting a determination result, and a multiplexer portion for multiplexing respective outputs from the first quantizing block and the second quantizing block in order to output to a transmission path. In this case, the encoder has a selector for selecting either one of the first quantizing block or the second quantizing in accordance with the determination result from the voice/non-voice signal identification unit, and the first quantizing block and the second quantizing block compressively encode signals by using a same quantization table.
The first and second quantizing blocks are processing blocks for quantizing a line spectral pair (LSP) coefficient.
When the LSP coefficient is quantized, the first and second coefficient quantizing blocks have a different evaluation criteria to be used for determining an appropriate quantized value from each other.
Further, when the LSP coefficient is quantized, the evaluation criteria used for determining an appropriate quantized value is changed adaptively in accordance with a characteristic of an input voice signal in the first quantizing block, while the evaluation criteria is steady regardless of a characteristic of an input voice signal in the second quantizing block.
Also, the voice/non-voice signal identification unit has digit detector for detecting a digit of a DTMF signal; and inputs an LSP coefficient to the second quantizing block.
The voice/non-voice signal identification unit uses a closed loop searching method as a method for searching an LSP codebook.
The second quantizing block uses a linear prediction residual signal of an input voice signal as a parameter used for determining an appropriate quantized value.