The present invention relates to a CELP (Code Excited Linear Prediction) type voice encoding device and a CELP type voice decoding device in a mobile communication system and the like which encodes and transmits a voice signal, and a mobile communication device.
The CELP type voice encoding device divides a voice into certain frame lengths, linearly predicts the voice in each frame and encodes a prediction residue (activating signal) resulting from the linear prediction for each frame by using an adaptive code vector and a noise code vector constituted of known waveforms. For the adaptive code vector and the noise code vector, as shown in FIG. 34, the adaptive code vector and the noise code vector which are stored in an adaptive code book 1 and a noise code book 2, respectively, are used as they are in some case. As shown in FIG. 35, in another case used are the adaptive code vector from the adaptive code book 1 and the noise code vector from the noise code book 2 which is synchronized with a pitch cycle L of the adaptive code book 1. FIG. 35 shows a constitution of a noise sound source vector generating portion in the CELP type voice encoding device which is disclosed in publications of Patent Application Laid-open No. Hei 5-19795 and Hei 5-19796. In FIG. 35, the adaptive code vector is selected from the adaptive code book 1, while the pitch cycle L is emitted. The noise code vector selected from the noise code book 2 is made periodic by a periodic unit 3 using the pitch cycle L. To make periodic the noise code vector, the vector is cut by the pitch cycle from its top and repeatedly connected plural times until a sub-frame length is reached.
However, in the aforementioned conventional CELP type voice encoding device in which the noise code vector is pitch-cycled, after an adaptive code vector component is removed, a residual pitch cycle component is removed by making periodic the noise code vector in the pitch cycle. Therefore, phase information which exists in one pitch waveform, that is, the information representing where a pitch pulse peak exists is not positively used. Therefore, enhancement of voice quality has been restricted.
The present invention has been developed to solve the conventional problem, and an object thereof is to provide a voice encoding device which can further enhance a voice quality.
To attain the aforementioned object, in the invention, by emphasizing an amplitude of a noise code vector which corresponds to a pitch peak position of an adaptive code vector, phase information existing in one pitch waveform is used to enhance a sound quality.
Also in the invention, by using the noise code vector which is restricted only in the vicinity of the pitch peak of the adaptive code vector, even when a small number of bits are allocated to the noise code vector, a deterioration in sound quality is minimized.
Further in the invention, by using the pitch peak position and a pitch cycle of the adaptive code vector to restrict a pulse position search range, even when there are a small number of bits indicative of pulse positions, the search range is narrowed while minimizing the deterioration in sound quality.
Also in the invention, when the pitch peak position and pitch cycle of the adaptive code vector are used to restrict the pulse position search range, especially by finely setting a pulse position searching precision in one or two pitch waveform, sound quality is enhanced in a voiced portion of a voice with a short pitch cycle.
Also in the invention, by varying the number of pulse sound source pulses with a pitch cycle value, sound quality is enhanced.
Also in the invention, by determining a pulse amplitude in the vicinity of the pitch peak position of the adaptive code vector and the other portions before searching the pulse sound source, sound quality is enhanced.
Also in the invention, since a pitch gain is quantized in multiple stages and a first stage of information quantization is performed immediately after an adaptive code book is searched, the first-stage quantized information of the pitch gain can be used as mode information for switching a noise code book. Encoding efficiency is thus enhanced.
Also in the invention, by using quantized pitch cycle information or quantized pitch gain information in the immediately previous sub-frame or the present sub-frame, a control is performed to switch search positions of the pulse sound source. Therefore, voice quality is enhanced.
Also in the invention, a phase continuity between sub-frames is determined backward. Only to the sub-frame whose phase is determined to be continuous, a phase adaptation process is applied. Thereby, without increasing the quantity of information to be transmitted, the phase adaptation process is switched. Thus, voice quality is enhanced. Additionally, when the phase adaptation process is not performed, by using a fixed code book, an error in transmission line can be effectively prevented from being propagated.
Also in the invention, it is determined by a degree of centralization of signal power to the vicinity of the pitch peak position in the adaptive code vector whether or not the phase adaptation process is to be applied. Thereby, without increasing the quantity of information to be transmitted, the phase adaptation process is switched. Voice quality is thus enhanced. Additionally, when the phase adaptation process is not performed, by using the fixed code book, a transmission line error can be effectively prevented from being propagated.
Also according to the invention, in the CELP type voice encoding device in which sound source pulses are searched in positions relative to the pitch peak position, the pulse positions are indexed in order from the top of the sub-frame. Thereby, the influence of the transmission line error which occurs in some frame is prevented from being propagated to subsequent frames which have no transmission line error.
Also according to the invention, in the CELP type voice encoding device in which sound source pulses are searched in the positions relative to the pitch peak position, the pulse positions are indexed in order from the top of the sub-frame. Additionally, different pulses having the same index are numbered in order from the top of the sub-frame. Thereby, the influence of the transmission line error which occurs in some frame is prevented from being propagated to the subsequent frames which have no transmission line error.
Also according to the invention, in the CELP type voice encoding device in which sound source pulses are searched in the positions relative to the pitch peak position, all the pulse search positions are not represented by the relative positions. Only a part of the vicinity of the pitch peak is represented by the relative positions, while the remaining part is set in predetermined fixed positions. Thereby, the influence of the transmission line error which occurs in some frame is prevented from being propagated to the subsequent frames which have no transmission line error.
Also in the invention, when the pitch peak position is obtained, instead of searching all object signals for the pitch peak position, there is provided a means for searching signals in the cut pitch cycle length for the pitch peak position. Thereby, the top pitch peak position can be extracted more precisely.
Also according to the invention, in a portion in which the pitch cycle is continuous between the sub-frames, that is, a portion which is supposed to be a voiced stationary portion, the pitch peak position in the immediately previous sub-frame, the pitch cycle in the immediately previous sub-frame and the pitch cycle in the present sub-frame are used to predict the pitch peak position in the present sub-frame. Based on the predicted pitch peak position, an existence range of the pitch peak position in the present sub-frame is restricted. Thereby, the pitch peak position can be extracted in such a manner that the phase in the voiced stationary portion is prevented from being discontinuous.
Also according to the invention, a sub-frame length is about loms or more, a relatively small quantity, i.e., about 15 bits per sub-frame of information is allocated to noise code book information and the pulse sound source is applied as the noise code book. In this case, there are provided at least one mode, respectively (two or more modes in total), of a mode in which the number of pulses is reduced to make sufficient each pulse position information and a mode in which each pulse position information is made coarse but the number of pulses is increased. In the constitution, the quality of a voiced rising portion of a voice signal is enhanced. Also, by increasing the number of pulses, voice quality is inhibited from being deteriorated because each pulse position information becomes coarse.
The invention provides a CELP type voice encoding device which is provided with a sound source generating portion for emphasizing an amplitude of a noise code vector corresponding to a pitch peak position of an adaptive code vector. By using phase information existing in one pitch waveform, sound quality can be enhanced.
Furthermore, in the voice generating portion, by multiplying an amplitude emphasizing window synchronized with a pitch cycle of the adaptive code vector by the noise code vector, the amplitude of the noise code vector corresponding to the pitch peak position of the adaptive code vector is emphasized. By emphasizing the amplitude of a noise sound source vector in synchronization with the pitch cycle, sound quality can be enhanced.
Also, in the voice generating portion, a triangular window centering on the pitch peak position of the adaptive code vector is used as the amplitude emphasizing window. An amplitude emphasizing window length can be easily controlled.
The invention also provides a CELP type voice encoding device which is provided with a sound source generating portion using a noise code vector which is restricted only to the vicinity of a pitch peak of an adaptive code vector. In the voice encoding device, by using the noise code vector which is restricted only to the vicinity of the pitch peak of the adaptive code vector, even when a small number of bits are allocated to the noise code vector, a deterioration in sound quality can be minimized. In a voiced portion in which a residual power is concentrated in the vicinity of the pitch pulse, sound quality can be enhanced.
The invention also provides a CELP type voice encoding device which uses a pulse sound source as a noise code book and which is provided with a sound source generating portion for determining a pulse position search range by a pitch cycle and a pitch peak position of an adaptive code vector. Even when a small number of bits are allocated to the pulse position, a deterioration in sound quality can be minimized.
Furthermore, the sound source generating portion determines the pulse position search range in such a manner that the vicinity of the pitch peak position of the adaptive code vector becomes dense while the other portions become coarse. Since a portion which has a high probability of raising pulses is finely searched, voice enhancement can be intended.
In adding, the pulse position search range is switched in accordance with the pitch cycle. Since based on the pitch cycle the pulse position search range is expanded/contracted, in the case of a short pitch cycle, one or two pitch waveform can be represented more finely, voice quality can be enhanced.
Still further, when plural pitch peaks exist in the adaptive code vector, the pulse position search range is restricted in such a manner that at least two pitch peak positions are included in the search range. An influence extended when a detected top pitch peak position is wrong can be reduced. Also, changes in configurations of waveforms in the vicinity of the top pitch peak and in the vicinity of the second pitch peak can be handled. Therefore, voice quality can be enhanced.
The invention also provides a CELP type voice encoding device which is provided with a sound source generating portion for switching a noise code book in accordance with voice analysis results. In the voice encoding device, the noise code book can be switched in accordance with features of input voice. Therefore, voice quality can be enhanced.
The invention further provides a CELP type voice encoding device which is provided with a sound source generating portion for switching a noise code book by using a transmission parameter which is extracted before the noise code book is searched. In the voice encoding device, the noise code book is changed by using information which has been already determined to be transmitted. Therefore, without increasing the quantity of information, the noise code book can be switched.
The voice encoding device which is constituted to switch the number of pulses according to the analysis result of a voice signal. Since the number of pulses is switched in accordance with the features of the input voice, voice quality can be enhanced.
The voice encoding device is also constituted to switch the number of pulses by using information which is extracted before the noise code book is searched. Since the number of pulses is switched using the information which has been already determined to be transmitted, without increasing the quantity of transmitted information, the number of pulses can be switched.
The voice encoding is further provided with the sound source generating portion for switching the number of pulses in accordance with the pitch cycle. Since the number of pulses is switched using the pitch cycle, without increasing the transmitted information, the number of pulses can be switched. Also, the optimum number of pulses varies with the pitch cycle, voice quality can be enhanced.
In addition, the number of pulses is switched in the case where a variation in pitch cycle is small between continuous sub-frames and in the case where the variation is not small. Since the number of pulses for use is switched in a rising portion and a stationary portion of a voice signal voiced portion, voice quality can be enhanced.
Further, a noise code vector generating portion using a pulse sound source as a noise sound source determines a pulse amplitude before searching a pulse position. Since the pulse sound source is allowed to have a variation in amplitude, voice quality can be enhanced. Also, since the amplitude is determined before the pulse is searched, the optimum pulse position can be determined for the amplitude.
Still further, in the noise code vector generating portion which uses the pulse sound source as the noise sound source, the pulse amplitude is changed in the vicinity of the pitch peak of the adaptive code vector and in the other portions. Since the amplitude is changed in the vicinity of the pitch peak of a sound source signal and the other portions, the pitch structure configuration of the sound source signal can be efficiently represented. The enhancement of voice quality and the efficient quantization of pulse amplitude information can be intended.
Still even further, by statistics or learning, the number of pulses in the pulse sound source for use is determined based on the pitch cycle. Since the optimum number of pulses for each pitch cycle is determined statistically or in other learning methods, voice quality can be enhanced.
The invention also provides a CELP type voice encoding device which is provided with a sound source generating portion for quantizing a pitch gain in multiple stages. In the first stage a value which is obtained immediately after an adaptive code book is searched is used as a quantized target, while in the second and subsequent stages a difference between the pitch gain which is determined through a closed loop searching after a sound source searching is completed and a value which is quantized in the first stage is used as the quantized target. In the voice encoding device, the sum of the adaptive code book and a fixed code book (noise code book) forms an operation sound source vector. In the CELP type voice encoding device, information which is obtained before the fixed code book (noise code book is searched is quantized and transmitted. Therefore, without applying independent mode information, the switching of the fixed code book (noise code book) and the like can be performed. Voice information can be efficiently encoded.
The voice encoding device is constituted to switch the fixed code book by using the quantized value of the pitch gain which is obtained immediately after the adaptive code book is searched. In the voice encoding device, the pitch gain which is obtained before the fixed code book is searched does not differ in value largely from the pitch gain which is obtained after the fixed code book is searched. By using this feature, without applying mode information the mode of the fixed code book can be switched. Voice quality can be enhanced.
The voice encoding device switches the fixed code book based on a change in pitch cycle between sub-frames. By using the continuity of the pitch cycle between the sub-frames and the like, it is determined whether or not a voiced/voiced stationary portion exists. By switching a sound source which is effective for the voiced/voiced stationary portion and a sound source which is effective for the other portions (unvoiced/rising portion and the like), voice quality can be enhanced.
The voice encoding device switches the fixed code book by using the pitch gain which is quantized in the immediately previous sub-frame. By using the continuity of the pitch gain between the sub-frames and the like, it is determined whether or not the voiced/voiced stationary portion exists. By switching the sound source which is effective for the voiced/voiced stationary portion and the sound source which is effective for the other portions (unvoiced/rising portion and the like), voice quality can be enhanced.
The voice encoding device switches the fixed code book based on the change in pitch cycle between the sub-frames and the quantized pitch gain. By using the pitch cycle and the pitch gain information as transmission parameters, it is determined whether or not the voiced/voiced stationary portion exists. By switching the sound source which is effective for the voiced/voiced stationary portion and the sound source which is effective for the other portions (unvoiced/rising portion and the like), voice quality can be enhanced.
The voice encoding device uses a pulse sound source code book as the fixed code book. Since the pulse sound source is used for the noise code book, the quantity of memory required for the noise code book and the quantity of arithmetic operation at the time of searching the noise code book can be reduced. Further, a representation property of rising in the voiced portion can be enhanced.
The invention also provides a CELP type voice encoding device which performs a voice encoding process for each sub-frame having a predetermined time length. It is determined whether or not a phase in the present sub-frame and a phase in the immediately previous sub-frame are continuous. A sound source is switched in the case where it is determined that they Are continuous and in the case where it is determined that they are not continuous. In the voice encoding device, a sound source constitution can be realized in which the voiced (stationary) portion and the other portions are cut and separated. Sound quality can be enhanced.
Also, a pitch peak position in the immediately previous sub-frame, a pitch cycle in the immediately previous sub-frame and a pitch cycle of the present sub-frame are used to predict a pitch peak position in the present sub-frame. By determining whether or not the pitch peak position in the present sub-frame obtained through the prediction is close to the pitch peak position which is obtained only from data in the present sub-frame, it is determined whether or not the phase in the immediately previous sub-frame and the phase in the present sub-frame are continuous. According to a determination result, a method of sound source encoding process is switched. Since the determination result is obtained by using the information which has been already transmitted or which is to be transmitted, the determination result does not need to be transmitted by using new transmission information.
The voice encoding device performs a phase adaptation process for the noise code book when it is determined that the phase in the immediately previous sub-frame and the phase in the present sub-frame are continuous and which does not perform the phase adaptation process for the noise code book when it is determined that the phase in the immediately previous sub-frame and the phase in the present sub-frame are not continuous. The phase adaptation process can be effectively performed. Also, since the continuity of the phase between the sub-frames is determined backward, switching information as to whether or not to apply the phase adaptation process does not need to be transmitted newly. Further, when the phase adaptation process is not applied, by using the fixed code book, the influence of a transmission line error can be effectively inhibited from being propagated.
The invention also provides a CELP type voice encoding device which performs a voice encoding process for each sub-frame having a predetermined time length. On the basis of a concentration degree of signal power in the vicinity of a pitch peak position of an adaptive code vector in the present sub-frame, an encoding process method of a sound source signal is switched. In the voice encoding device, without requiring new transmission information for switching a sound source constitution (encoding process method of the sound source signal), the sound source constitution can be adapted and switched.
The voice encoding device performs a phase adaptation process for a noise code book when the percentage in the entire signal of one pitch cycle length of the signal power in the vicinity of the pitch peak of the adaptive code vector in the present sub-frame is equal to or larger than a predetermined value and which does not perform the phase adaptation process for the noise code book when the percentage is less than the predetermined value. In accordance with the pulse intensity of the adaptive code vector, the phase adaptation process can be adapted and controlled (switched). Voice quality can be enhanced. Also, new transmission information is unnecessary for controlling (switching) the phase adaptation process. Further, when the phase adaptation process is not performed, by using the fixed code book, the influence of the transmission line error can be effectively inhibited from being propagated.
Also, as the phase :adaptation process, a pulse position searching is performed densely in the pitch peak vicinity and the pulse position search is performed coarsely in the portions other than the pitch peak vicinity. A pulse sound source is applied in a noise sound source. Since the pulse sound source is used as the noise code book, the quantity of memory required for the noise code book and the quantity of arithmetic operation at the time of searching the noise code book can be reduced.. Further, the representation property of the rising in the voiced portion can be enhanced.
Furthermore, indexes indicative of pulse positions are arranged in order from the top of the sub-frame. The indexes indicative of the pulse positions are arranged from the top of the sub-frame in such a manner that a pulse with a smaller index number is positioned closer to the top of the sub-frame. Therefore, a deviation of the pulse position which arises when the pitch peak position is wrong can be minimized. The influence of the transmission line error can be prevented from being propagated.
Still further, in the case of the same index number, pulses are numbered in order from the top of the sub-frame. Further, each pulse search position is determined in such a manner that the vicinity of the pitch peak position becomes dense and the portions other than the pitch peak vicinity become coarse. In the case of the same index number, each pulse number is determined in such a manner that the pulse with a, smaller pulse number is positioned closer to the top of the sub-frame. Therefore, in addition to the pulse indexing, the pulse numbering is defined. The deviation of the pulse position arising when the pitch peak position is wrong can further be reduced. The propagation of the influence of the transmission line error can further be reduced.
Still even further, a part of pulse search positions is determined by the pitch peak position, while other pulse search positions are predetermined fixed positions irrespective of the pitch peak position. Even when the pitch peak position is wrong, a probability that a sound source pulse position is wrong is reduced. Therefore, the influence of the transmission line error can be inhibited from being propagated.
The voice encoding device has a pitch peak position calculation means which, when obtaining the pitch peak position of a voice having a predetermined time length or the sound source signal, cuts out only a pitch cycle length from the relevant signal and determines the pitch peak position in the cut-out signal. To select the pitch peak from one pitch waveform, a point at which an amplitude value (absolute value) becomes maximum may be simply searched. Even when the sub-frame includes a waveform exceeding one pitch cycle, the pitch peak position can be obtained precisely.
The voice encoding device when cutting out only the pitch cycle length from the relevant signal first uses the entire relevant signal without cutting out one cycle length to determine the pitch peak position, uses the determined pitch peak position as a cutting-out start point to cut out one pitch cycle length and determines the pitch peak position in the cut-out signal. When the pitch peak position is determined by using the entire relevant signal, a resulting phenomenon in which a second peak in one pitch waveform, is determined as the pitch peak position can be avoided. Specifically, an error in extraction of the pitch peak position which arises when the pitch cycle is not synchronized with the sub-frame length can be avoided.
The invention also provides the CELP type voice encoding device which performs a voice encoding process for each sub-frame having a predetermined time length. When the pitch peak position in the present sub-frame is calculated and a difference between the pitch cycle in the immediately previous sub-frame and the pitch cycle in the present sub-frame is in a predetermined range, then the pitch peak position in the immediately previous sub-frame, the pitch cycle in the immediately previous sub-frame and the pitch cycle in the present sub-frame are used to predict the pitch peak position in the present sub-frame. By using the pitch, peak position in the present sub-frame which is obtained through the prediction, an existence range of the pitch peak position in the present sub-frame is restricted beforehand, and the pitch peak position is searched in the range. In the voice encoding device, by considering the pitch peak position in the immediately previous sub-frame, the pitch peak position in the present sub-frame is determined. If the pitch peak position is obtained only from the present sub-frame, the second peak position in one pitch peak waveform, is wrongly detected. In this case, the wrong detection is avoided in the method.
The invention also provides a CELP type voice encoding device which performs a voice encoding process for each sub-frame having a predetermined time length. A pulse sound source is used as a noise code book, and there are provided at least two modes of the noise code book. By switching the modes, the number of sound source pulses can be changed. In at least one mode, there are a sufficient quantity of each pulse position information and a small number of pulses. In the other modes, there is a shortage of each pulse position information but a large number of pulses. By transmitting mode switch information, the modes are switched. In the voice encoding device, since there is provided the mode in which there are a sufficient quantity of position information and a small number of sound source pulses, the quality of the voiced rising portion of the voice signal is enhanced. Also, the mode in which there are an insufficient quantity of position information and a large number of sound source pulses can be effectively used.
Also, when the pitch cycle is short, by restricting a sound source pulse search range to a narrow range in accordance with the pitch cycle, the sound source pulse position information is decreased while the number of sound source pulses is increased. For the sound source signal which has a pitch periodicity with a short pitch cycle, while keeping a sufficient quantity of sound source pulse position information per pitch cycle, the number of sound source pulses can be increased, voice quality can be enhanced.
The voice encoding device determines the pulse position search range in such a manner that in the mode in which there is a shortage of each pulse position information but a large number of pulses, the search positions of sound source pulses become dense in the pitch peak position vicinity while the search positions of sound source pulses become coarse in the other portions. The position information of sound source pulses is concentrated in a portion in which there is a high probability of raising the sound source pulses. Therefore, the mode in which there is an insufficient quantity of sound source pulse position information and a large number of sound source pulses can be used with an enhanced efficiency.
Also, in the sound source mode in which there are a small number of pulses and a sufficient quantity of position information a part of the position information is allocated to an index indicative of a noise sound source code vector. Without providing a new mode, an unvoiced consonant portion or a noise input signal 1 can be handled.
The invention also provides a recording medium which records a program for executing a function of the voice encoding device and can be read by a computer. Since the recording medium is read by the computer, the function of the voice encoding device can be realized.
The invention also provides methods which have the substantially same contents of the voice encoding devices, each providing the similar effect.
The invention also provides a recording medium which records a program for executing the voice encoding method and can be read by a computer. Since the recording medium is read by the computer, the function of the voice encoding device can be realized.
The invention also provides voice decoding devices which have the sound source generating portions, each providing the similar effect.
The invention also provides a recording medium which records a program for executing the voice decoding device and can be read by a computer. Since the recording medium is read by the computer, the function of the voice encoding device can be realized.
The invention also provides voice decoding methods which have the sound source generating methods, each providing the similar effect.
The invention also provides a recording medium which records a program for executing the voice decoding method and can be read by a computer. Since the recording medium is read by the computer, the function of the voice encoding device can be realized.