Today, as an efficient speech coding method, a method is proposed which processes an input signal sequence (in particular, speech) in units of sections (frames) having a certain duration of about 5 to 20 ms included in an input signal, for example. The method involves separating one frame of speech into two types of information, that is, linear filter characteristics that represent envelope characteristics of a frequency spectrum and a driving sound source signal for driving the filter, and separately encodes the two types of information. A known method of encoding the driving sound source signal in this method is a code-excited linear prediction (CELP) that separates a speech into a periodic component that is considered to correspond to a pitch frequency (fundamental frequency) of the speech and the other component (see Non-patent literature 1).
With reference to FIGS. 1 and 2, an encoding apparatus 1 according to prior art will be described. FIG. 1 is a block diagram showing a configuration of the encoding apparatus 1 according to prior art. FIG. 2 is a flow chart showing an operation of the encoding apparatus 1 according to prior art. As shown in FIG. 1, the encoding apparatus 1 comprises a linear prediction analysis part 101, a linear prediction coefficient encoding part 102, a synthesis filter part 103, a waveform distortion calculating part 104, a code book search controlling part 105, a gain code book part 106, a driving sound source vector generating part 107, and a synthesis part 108. In the following, an operation of each component of the encoding apparatus 1 will be described.
<Linear Prediction Analysis Part 101>
The linear prediction analysis part 101 receives an input signal sequence xF(n) in units of frames that is composed of a plurality of consecutive samples included in an input signal x(n) in the time domain (n=0, . . . , L−1, where L denotes an integer equal to or greater than 1). The linear prediction analysis part 101 receives the input signal sequence xF(n) and calculates a linear prediction coefficient a(i) that represents frequency spectrum envelope characteristics of an input speech (i represents a prediction order, i=1, . . . , P, where P denotes an integer equal to or greater than 1) (S101). The linear prediction analysis part 101 may be replaced with a non-linear one.
<Linear Prediction Coefficient Encoding Part 102>
The linear prediction coefficient encoding part 102 receives the linear prediction coefficient a(i), quantizes and encodes the linear prediction coefficient a(i) to generate a synthesis filter coefficient a^(i) and a linear prediction coefficient code, and outputs the synthesis filter coefficient a^(i) and the linear prediction coefficient code (S102). Note that a^(i) means a superscript hat of a(i). The linear prediction coefficient encoding part 102 may be replaced with a non-linear one.
<Synthesis Filter Part 103>
The synthesis filter part 103 receives the synthesis filter coefficient a^(i) and a driving sound source vector candidate c(n) generated by the driving sound source vector generating part 107 described later. The synthesis filter part 103 performs a linear filtering processing on the driving sound source vector candidate c(n) using the synthesis filter coefficient a^(i) as a filter coefficient to generate an input signal candidate xF^(n) and outputs the input signal candidate xF^(n) (S103). Note that x^ means a superscript hat of x. The synthesis filter part 103 may be replaced with a non-linear one.
<Waveform Distortion Calculating Part 104>
The waveform distortion calculating part 104 receives the input signal sequence xF(n), the linear prediction coefficient a(i), and the input signal candidate xF^(n). The waveform distortion calculating part 104 calculates a distortion d for the input signal sequence xF(n) and the input signal candidate xF^(n) (S104). In many cases, the distortion calculation is conducted by taking the linear prediction coefficient a(i) (or the synthesis filter coefficient a^(i)) into consideration.
<Code Book Search Controlling Part 105>
The code book search controlling part 105 receives the distortion d, and selects and outputs driving sound source codes, that is, a gain code, a period code and a fixed (noise) code used by the gain code book part 106 and the driving sound source vector generating part 107 described later (S105A). If the distortion d is a minimum value or a quasi-minimum value (S105BY), the process proceeds to Step S108, and the synthesis part 108 described later starts operating. On the other hand, if the distortion d is not the minimum value nor the quasi-minimum value (S105BN), Steps S106, S107, S103 and S104 are sequentially performed, and then the process returns to Step S105A, which is an operation performed by this component. Therefore, as far as the process proceeds to the branch of Step S105BN, Steps S106, S107, S103, S104 and S105A are repeatedly performed, and eventually the code book search controlling part 105 selects and outputs the driving sound source codes for which the distortion d for the input signal sequence xF(n) and the input signal candidate xF^(n) is minimal or quasi-minimal (S105BY).
<Gain Code Book Part 106>
The gain code book part 106 receives the driving sound source codes, generates a quantized gain (gain candidate) ga,gr from the gain code in the driving sound source codes and outputs the quantized gain ga,gr (S106).
<Driving Sound Source Vector Generating Part 107>
The driving sound source vector generating part 107 receives the driving sound source codes and the quantized gain (gain candidate) ga,gr and generates a driving sound source vector candidate c(n) having a length equivalent to one frame from the period code and the fixed code included in the driving sound source codes (S107). In general, the driving sound source vector generating part 107 is often composed of an adaptive code book and a fixed code book. The adaptive code book generates a candidate of a time-series vector that corresponds to a periodic component of the speech by cutting the immediately preceding driving sound source vector (one to several frames of driving sound source vectors having been quantized) stored in a buffer into a vector segment having a length equivalent to a certain period based on the period code and repeating the vector segment until the length of the frame is reached, and outputs the candidate of the time-series vector. As the “certain period” described above, the adaptive code book selects a period for which the distortion d calculated by the waveform distortion calculating part 104 is small. In many cases, the selected period is equivalent to the pitch period of the speech. The fixed code book generates a candidate of a time-series code vector having a length equivalent to one frame that corresponds to a non-periodic component of the speech based on the fixed code, and outputs the candidate of the time-series code vector. These candidates may be one of a specified number of candidate vectors stored independently of the input speech according to the number of bits for encoding, or one of vectors generated by arranging pulses according to a predetermined generation rule. The fixed code book intrinsically corresponds to the non-periodic component of the speech. However, in a speech section with a high pitch periodicity, in particular, in a vowel section, a fixed code vector may be produced by applying a comb filter having a pitch period or a period corresponding to the pitch used in the adaptive code book to the previously prepared candidate vector or cutting a vector segment and repeating the vector segment as in the processing for the adaptive code book. The driving sound source vector generating part 107 generates the driving sound source vector candidate c(n) by multiplying the candidates ca(n) and cr(n) of the time-series vector output from the adaptive code book and the fixed code book by the gain candidate ga,gr output from the gain code book part 23 and adding the products together. Some actual operation may involve only one of the adaptive code book and the fixed code book.
<Synthesis Part 108>
The synthesis part 108 receives the linear prediction coefficient code and the driving sound source codes, and generates and outputs a synthetic code of the linear prediction coefficient code and the driving sound source codes (S108). The resulting code is transmitted to a decoding apparatus 2.
Next, with reference to FIGS. 3 and 4, the decoding apparatus 2 according to prior art will be described. FIG. 3 is a block diagram showing a configuration of the decoding apparatus 2 according to prior art that corresponds to the encoding apparatus 1. FIG. 4 is a flow chart showing an operation of the decoding apparatus 2 according to prior art. As shown in FIG. 3, the decoding apparatus 2 comprises a separating part 109, a linear prediction coefficient decoding part 110, a synthesis filter part 111, a gain code book part 112, a driving sound source vector generating part 113, and a post-processing part 114. In the following, an operation of each component of the decoding apparatus 2 will be described.
<Separating Part 109>
The code transmitted from the encoding apparatus 1 is input to the decoding apparatus 2. The separating part 109 receives the code and separates and retrieves the linear prediction coefficient code and the driving sound source code from the code (S109).
<Linear Prediction Coefficient Decoding Part 110>
The linear prediction coefficient decoding part 110 receives the linear prediction coefficient code and decodes the liner prediction coefficient code into the synthesis filter coefficient a^(i) in a decoding method corresponding to the encoding method performed by the linear prediction coefficient encoding part 102 (S110).
<Synthesis Filter Part 111>
The synthesis filter part 111 operates the same as the synthesis filter part 103 described above. That is, the synthesis filter part 111 receives the synthesis filter coefficient a^(i) and the driving sound source vector candidate c(n). The synthesis filter part 111 performs the linear filtering processing on the driving sound source vector candidate c(n) using the synthesis filter coefficient a^(i) as a filter coefficient to generate xF^(n) (referred to as a synthesis signal sequence xF^(n) in the decoding apparatus) and outputs the synthesis signal sequence xF^(n) (S111).
<Gain Code Book Part 112>
The gain code book part 112 operates the same as the gain code book part 106 described above. That is, the gain code book part 112 receives the driving sound source codes, generates ga,gr (referred to as a decoded gain ga,gr in the decoding apparatus) from the gain code in the driving sound source codes and outputs the decoded gain ga,gr (S112).
<Driving Sound Source Vector Generating Part 113>
The driving sound source vector generating part 113 operates the same as the driving sound source vector generating part 107 described above. That is, the driving sound source vector generating part 113 receives the driving sound source codes and the decoded gain ga,gr and generates c(n) (referred to as a driving sound source vector c(n) in the decoding apparatus) having a length equivalent to one frame from the period code and the fixed code included in the driving sound source codes and outputs the c(n) (S113).
<Post-Processing Part 114>
The post-processing part 114 receives the synthesis signal sequence xF^(n). The post-processing part 114 performs a processing of spectral enhancement or pitch enhancement on the synthesis signal sequence xF^(n) to generate an output signal sequence zF(n) with a less audible quantized noise and outputs the output signal sequence zF(n) (S114).