Harmonicity represents the degree of acoustic periodicity of an audio signal, which is an important metric for many speech processing tasks. For example, it has been used to measure voice quality (Xuejing Sun, “Pitch determination and voice quality analysis using subharmonic-to-harmonic ratio,” ICASSP 2002). It has also been used for voice activity detection and noise estimation. For example, in Sun, X., K. Yen, et al., “Robust Noise Estimation Using Minimum Correction with Harmonicity Control,” Interspeech. Makuhari, Japan, 2010, a solution is proposed, where harmonicity is used to control minimum search such that a noise tracker is more robust to edge cases such as extended period of voicing and sudden jump of noise floor.
Various approaches have been proposed to measure the harmonicity. For example, one of the approaches is called Harmonics-to-Noise Ratio (HNR). Another approach, Subharmonic-to-Harmonic Ratio (SHR) has been proposed to describe the amplitude ratio between subharmonics and harmonics (Xuejing Sun, “Pitch determination and voice quality analysis using subharmonic-to-harmonic ratio,” ICASSP 2002), where the pitch and SHR is estimated through shifting and summing linear amplitude spectra on logarithmic frequency scale.
In the previous approach for estimating SHR, the calculation is performed in the linear amplitude domain, where the large dynamic range could lead to instability due to numerical issues. The linear amplitude also limits the contribution from high frequency components, which are known to be important perceptually and crucial for classification of many high frequency rich audio content. Furthermore, an approximation has been used in the original approach (Sun, 2002) to calculate the subharmonic-to-harmonic ratio (otherwise a direct division in the linear domain, causing numerical issues, has to be used), which leads to inaccurate results.