Field of the Invention
System and method for searching for audio portions of an audio-containing file or streaming audio-containing link, and combining occurrences of the detected audio portions into an overall relevancy score.
Description of Related Art
Searching audio material is typically a non-deterministic process, characterized by the association of relevancy scores with each possible match found. Simple queries search for occurrences of an individual word or phrase. Compound queries are used to search for the occurrences of combinations of words, phrases and other compound queries. The search mechanism is required to compute a compound relevancy score from the constituent relevancy scores.
When searching information stored or available in a computerized medium (e.g., stored audio files or streaming audio communication link), there are two known approaches for combining partial matches into an overall relevancy score: (1) normal Boolean operators, so that “at least 2 of 3” could be formally expressed as (A and B and not C) or (A and C and not B) or (B and C and not A) or (A and B and C); (2) the use of weighted combinations of hit counts as reported in the art.
The specification of at least M of N via Boolean operators is extremely complex. Current audio search systems do not formally apply probability calculus, so the mathematically correct expression is usually simplified to reporting a hit if hits are scored for (A and B) or (B and C) or (C and A) in the at least 2 of 3″ case. As N and M increase, these expressions become longer and longer—for example there are three ways of selecting two from three, as shown, but forty-five distinct ways of selecting two from ten.
Some approaches known in the art are based on weighted combinations of hit counts and require labeled audio training data to derive the weights. The art does not specify how weights can be derived in the absence of such labeled data and does not describe a method for incorporating the individual hit relevancy scores into this process. The art thresholds individual hit scores and subsequently derives hit counts by treating each hit with a score above threshold as being a definite hit.
Some processes known in the art describe a process in which the weights are obtained as “usefulness” scores, but this also requires labeled audio training data.
A process known in the art for audio information retrieval is to apply speech-to-text methods and then use normal text retrieval approaches. This technique is less relevant as the final operation tends to be a support vector machine, again trained on pre-labeled data.