The technology disclosed relates to audio signal processing. It includes a series of modules that individually are useful to solve audio signal processing problems. Among the problems addressed are buzz removal, selecting a pitch candidate among pitch candidates based on local continuity of pitch and regional octave consistency, making small adjustments in pitch, ensuring that a selected pitch is consistent with harmonic peaks, determining whether a given frame or region of frames includes harmonic, voiced signal, extracting harmonics from voice signals, and detecting vibrato. One environment in which these modules are useful is transcribing singing or humming into a symbolic melody. Another environment that would usefully employ some of these modules is speech processing. Some of the modules, such as buzz removal, are useful in many other environments as well.
Pitch selection, given candidate pitches for a frame of sound, is widely recognized as useful. See, e.g., Kwon, Y. H., D. J. Park and B. C. Ihm. “Simplified Pitch Detection Algorithm of Mixed Speech Signals.” Circuits and Systems, 2000. Proceedings. ISCAS 2000 Geneva. the 2000 IEEE International Symposium on. Geneva, Switzerland, May 28-31, 2000; Marley, J. System and Method for Sound Recognition with Feature Selection Synchronized to Voice Pitch. U.S. Pat. No. 4,783,807. Nov. 8, 1988.
Classification of an audio signal into voiced, unvoiced and silence has been recognized as useful for a preliminary acoustic segmentation of speech and song. Qi, Y., and B. R. Hunt. “Voiced-Unvoiced-Silence Classifications of Speech using Hybrid Features and a Network Classifier.” Speech and Audio Processing, IEEE Transactions on 1.2 (1993): 250-5. This segmentation is useful in both music transcription and speech recognition.
Vibrato is an ornamentation of vocal, string and other harmonic sound that varies in pitch (and sometimes in volume) about a selected pitch. Sundberg, J. “Acoustic and Psychoacoustic Aspects of Vocal Vibrato.” Speech, Music and Hearing: Quarterly Progress and Status Report 35.2-3 (1994): 45-67. Nov. 1, 2008<http://www.speech.kth.se/prod/publications/files/qpsr/1994/1994—35—2-3—045-068.pdf>. Vibrato suppression is also useful for acoustic segmentation and for pitch detection. Collins, N. “Using a Pitch Detector for Onset Detection.” 6th International Conference on Music Information Retrieval. London, Sep. 11-15, 2005. In many environments, flagging of frames that contain vibrato helps in pitch selection, for instance, to distinguish between a singer's ornamentation of pitch and alternation between pitches, such as trills.
Therefore, an opportunity arises to improve on the efficiency and accuracy of pitch selection, voicing detection and vibrato suppression or flagging. Improved audio processing components should lead to improved music transcription, voice recognition and acoustic processing.