To save bandwidths for transmitting and storing speech and audio signals, the speech and audio encoding technology has been widely used. The technology includes lossy encoding and lossless encoding. For lossy encoding, the reconstructed signal may not keep the same as the original signal, but the signal redundancy information may be minimized according to the features of the sound source and the human auditory perception, little coding information is transmitted and high speech and audio quality is achieved. For lossless encoding, the reconstructed signal may be the same as the original signal, so that the final decoding quality is not degraded. Generally, the lossy encoding compression efficiency is high, but the quality of the reconstructed speech and audio signal cannot be guaranteed. Lossless encoding can guarantee the speech quality because it can reconstruct signals without distortion, but the compression rate is only about 50%.
The pitch is an important parameter either in lossy encoding or lossless encoding. The final encoding performance depends on the accuracy of the pitch detection. In the prior art, a lot of pitch detection methods are available, one of which includes: mapping a signal to a domain, performing search pre-processing, performing coarse search on an open loop basis, and then performing refined search on a closed loop basis, and finally performing post-processing such as pitch smoothing. All these operations are performed in one domain, for example, time domain, frequency domain, cepstrum domain, signal domain, or residual domain.
During the implementation of the present invention, the inventor finds the prior art has the following problems: A lot of operations need be performed in different domains in the actual algorithm, and the pitch detection algorithm shows different levels of performance and complexity in different domains. For example, in the time domain, the pitch detection complexity is low; in the frequency domain, the pitch detection accuracy is higher; in the signal domain, the pitch is better, and is easy to detect; in the residual domain, the pitch is poor, and thus is difficult to detect.