Wireless and voice-over-internet protocol (VoIP) communications are subject to frequent degradation of packets as a result of adverse connection conditions. The degraded packets may be lost or corrupted (comprise an unacceptably high error rate). Such degraded packets result in clicks and pops or other artefacts being present in the output voice signal at the receiving end of the connection. This degrades the perceived speech quality at the receiving end and may render the speech unrecognisable if the packet degradation rate is sufficiently high.
Broadly speaking, two approaches are taken to combat the problem of degraded packets. The first approach is the use of transmitter-based recovery techniques. Such techniques include retransmission of degraded packets, interleaving the contents of several packets to disperse the effect of packet degradation, and addition of error correction coding bits to the transmitted packets such that degraded packets can be reconstructed at the receiver. In order to limit the increased bandwidth requirements and delays inherent in these techniques, they are often employed such that degraded packets can be recovered if the packet degradation rate is low, but not all degraded packets can be recovered if the packet degradation rate is high. Additionally, some transmitters may not have the capacity to implement transmitter-based recovery techniques.
The second approach taken to combating the problem of degraded packets is the use of receiver-based concealment techniques. Such techniques are generally used in addition to transmitter-based recovery techniques to conceal any remaining degradation left after the transmitter-based recovery techniques have been employed. Additionally, they may be used in isolation if the transmitter is incapable of implementing transmitter-based recovery techniques. Low complexity receiver-based concealment techniques such as filling in a degraded packet with silence, noise, or a repetition of the previous packet are used, but result in a poor quality output voice signal. Regeneration based schemes such as model-based recovery (in which speech on either side of the degraded packet is modelled to generate speech for the degraded packet) produce a very high quality output voice signal but are highly complex, consume high levels of power and are expensive to implement. In practical situations interpolation-based techniques are preferred. These techniques generate a replacement packet by interpolating parameters from the packets on one or both sides of the degraded packet. These techniques are relatively simple to implement and produce an output voice signal of reasonably high quality.
Pitch based waveform substitution is a preferred interpolation-based packet degradation recovery technique. Voice signals appear to be composed of a repeating segment when viewed over short time intervals. This segment repeats periodically with a time period referred to as a pitch period. In pitch based waveform substitution, the pitch period of the voiced packets on one or both sides of the degraded packet is estimated. A waveform of the estimated pitch period or a multiple of the estimated pitch period is then used (or repeated and used) as a substitute for the degraded packet. This technique is effective because the pitch period of the degraded voice packet will normally be substantially the same as the pitch period of the voice packets on either side of the degraded packet.
In pitch based waveform substitution techniques, discontinuities at the boundaries between the replacement packet and the remaining signal can often be detected as artefacts in the output voice signal. Cross fading the signals on either side of a boundary using an overlap add function is used to reduce such discontinuities. Pattern matching methods have also been proposed.
Many methods are used to estimate the pitch period of a voice signal. For a typical one of these methods, the calculations involved in estimating the pitch period accounts for over 90% of the algorithmic complexity in the pitch based waveform substitution technique. Although the complexity level of the calculation is low, it is significant for low-power platforms such as Bluetooth. In order to correctly determine the pitch period of a voice signal, a wide predefined range of pitch period values is analysed, for example from 2.5 ms (for a person with a high voice) to 16 ms (for a person with a low voice). For most pitch period determination algorithms, the wider the pitch period range used, the higher the computational complexity.
One way to reduce the computational complexity is to reduce the number of calculations that the algorithm computes. ITU-T Recommendation G.711 Appendix 1, “A high quality low-complexity algorithm for packet loss concealment with G.711” reduces the number of calculations by using a two phase approach to pitch period estimation. In the first phase, a coarse search is performed over the entire predefined range of pitch periods to determine a rough estimate of the pitch period. In the second phase, a fine search is performed over a refined range of pitch periods encompassing the rough estimate of the pitch period. A more accurate refined estimate of the pitch period can therefore be determined. The number of calculations that the algorithm computes is therefore reduced compared to an algorithm that performs a fine search over the entire predefined range of pitch periods.
U.S. patent application Ser. No. 11/734,824 proposes a two phase approach to pitch period estimation that further reduces the number of calculations that the algorithm computes. In this application a coarse search is performed on a decimated signal over the entire predefined range of pitch periods. On identifying an initial best candidate for the pitch period, a refined range of pitch periods is calculated centred on the initial best candidate. Pitch periods at the midpoints between the initial best candidate and the ends of the refined range are analysed. If preferential to the initial best candidate, one of these midpoint pitch periods is taken as a refined best candidate for the pitch period. Further bisectional searches may be performed to yield a more accurate estimate of the pitch period. The number of calculations that the algorithm computes is therefore reduced compared to an algorithm that performs a fine search over the entire refined range of pitch periods.
Although these approaches reduce the number of calculations that the algorithms compute, computational complexity associated with estimating the pitch period remains a problem, particularly with low-power platforms such as Bluetooth.
Additionally, pitch period determination algorithms generally involve comparing portions of a signal separated by lag values. The algorithm selects the lag value associated with the most similar portions to be the estimate of the pitch period. However, portions of the signal separated by multiples of the pitch period will also be very similar. A common problem with pitch period detection algorithms is that a multiple of the pitch period is selected as the estimate of the pitch period.
Chu, Wai C. Speech coding algorithms: foundation and evolution of standardised coders (Wiley, 2003) discloses a method for checking for multiples of a pitch period once an estimate of the pitch period has been determined using an autocorrelation algorithm. The pitch period estimate is divided by one or more integers to form check points. If a check point yields a sufficiently high autocorrelation value it is used as the refined estimate of the pitch period.
It is desirable to use a multiple checking algorithm such as the one described above in order to increase the accuracy of the pitch period estimate. However, such checking algorithms increase the computational complexity associated with estimating the pitch period.
There is thus a need for an improved method of estimating the pitch period of a signal that increases the accuracy of the estimate by reducing the likelihood that the estimate is a multiple of the ‘true’ pitch period, but that also reduces the computational complexity associated with the estimation.