1. Field of the Invention
The present invention generally relates to an audio coding technique, and more particularly to a method of and an apparatus for coding audio pitch information and a program storage device readable by the audio pitch coding apparatus on which the audio pitch coding program is recorded.
2. Description of the Related Art
The pitch based on a long cycle correlation of an audio signal due to a cyclic characteristic of a vibration of a human vocal chord is extracted and coded in order to code the audio signal at a high efficiency. Namely, since waveforms similar to each other are repeated at a predetermined cycle determined by this pitch in the audio signal, it is possible to code the audio signal at a high efficiency by combining the audio coding technique with a short time period prediction based on a proximity correlation. In the CELP (Code Excited Linear Prediction) as a representative audio coding method, such a construction is employed that the content of an adaptive code book is used as a driving source of a past synthesis filter, is once reproduced, and the pitch is determined so as to minimize a perceptual weighted error power with the input signal. Thus, the pitch extraction is an indispensable element of the technique.
By the way, in the audio coding method such as the CELP, the input speech is divided into a plurality of frames, the coding process is performed for each of the frames, and each of the frames is further divided into a plurality of sub frames. The sub frame is a basic unit for the processes such as a vector quantization process and the like. Then, the above mentioned pitch extraction is performed such that respective one of the pitches is calculated for each of the sub frames, and this calculated pitch is code-processed within a range of one or a plurality of frames. Here, upon coding the calculated pitch, although it is possible to code the value of the calculated pitch itself with respect to each of the sub frames in one frame, it is effective to code the value of the calculated pitch itself with respect to only one sub frame at the head in each frame and to code the difference between the calculated pitch and that of the previous sub frame with respect to the subsequent sub frames in the frame, so as to reduce the data amount of coding.
However, the audio signal can be categorized into: a voiced sound, in which an input speech accompanying the vibration of a vocal chord exists; an unvoiced sound, in which only an input speech not accompanying the vibration of a vocal chord exists; and a silence in which an input speech does not exist. The audio pitch has a meaning with respect to the portion of the voiced sound. Thus, after judging into which condition the audio signal is categorized, the pitch coding process is not performed if the sub frame, which is the minimum unit for the process, is judged to be the unvoiced sound or the silence (i.e., other than the voiced sound). Accordingly, if the head of the sub frames in one frame is not judged to be the voiced sound, since the standard value for the difference to be obtained for the subsequent sub frames is not determined, the pitch coding process is not performed as for one whole frame. In this case, the reproduction signal is not outputted from the adaptive code book in the CELP or the like.
Therefore, in the above mentioned audio coding method, it is difficult to reduce the data amount for coding and to realize a fine pitch coding process with a high fidelity for the input speech. Especially, in case that one frame is rather long or in case that the number of sub frames in one frame is large, since such a possibility increases that the sub frame, which is not judged to be the voiced sound, is included in the frame, the quality of the audio coding process may be certainly degraded.