Music melody matching, usually presented in the form of Query-by-Humming (QBH), is a content-based way of retrieving music data. Previous techniques searched melodies based on either their “continuous (frame-based)” pitch contours or their note transcriptions. The former are pitch values sampled at fixed, short intervals (usually 10 ms), while the latter are sequences of quantized, symbolic representations of melodies. For example, the former may be a sampled curve starting at 262 Hz, rising to 294 Hz and then to 329 Hz, before dropping down to and staying at 196 Hz, while the latter (corresponding to the former) may be “C4-D4-E4-G3-G3” or “Up-Up-Down-Same.” Frame-based pitch contours (which we call hereon “pitch contours”) have been suggested in the past as providing more accurate match results compared to the predominantly-used note transcriptions because the latter may segment and quantize dynamic pitch values too rigidly, compounding the effect of pitch estimation errors. The major drawback is that pitch contours hold much more data and therefore require much more computation than note-based representations, especially when using the popular dynamic time warping (DTW) to measure the similarity between two melodies.
No method has been reported so far that can efficiently match frame-based pitch contours while adjusting for music key shifts, tempo differences, and rhythmic inconsistencies between query and target and also search arbitrary locations of targets. Previous methods using pitch contours are limited in that they require the query and target to have reasonably similar tempo, or constrain the starting locations of query melodies to the beginning of specific music phrases. Some methods do not have these limitations, but on the other hand, require far too much computation for practical use because they do dynamic programming over huge spaces of data. Therefore, a need exists for a method and apparatus that can accurately and efficiently match an audible query to a set of audible targets and can accommodate for music key shifts, tempo differences, and rhythmic inconsistencies between query and target, while also searching arbitrary locations of targets.
Skilled artisans will appreciate that elements in the figures are illustrated for simplicity and clarity and have not necessarily been drawn to scale. For example, the dimensions and/or relative positioning of some of the elements in the figures may be exaggerated relative to other elements to help to improve understanding of various embodiments of the present invention. Also, common but well-understood elements that are useful or necessary in a commercially feasible embodiment are often not depicted in order to facilitate a less obstructed view of these various embodiments of the present invention. It will further be appreciated that certain actions and/or steps may be described or depicted in a particular order of occurrence while those skilled in the art will understand that such specificity with respect to sequence is not actually required. Those skilled in the art will further recognize that references to specific implementation embodiments such as “circuitry” may equally be accomplished via replacement with software instruction executions either on general purpose computing apparatus (e.g., CPU) or specialized processing apparatus (e.g., DSP). It will also be understood that the terms and expressions used herein have the ordinary technical meaning as is accorded to such terms and expressions by persons skilled in the technical field as set forth above except where different specific meanings have otherwise been set forth herein.