Various audio recognition systems and methods are known for processing an incoming audio stream (a ‘programme’) and searching an internal database of music and sound effects (‘tracks’) to identify uses of those tracks within the programme.
In the real world, music is often only one of the layers of audio of a programme. One of the challenges for audio recognition is to recognize the identity of music even in circumstances where there are other layers of audio such as sound effects, voiceover, ambience, etc. that occur simultaneously. Other distortions include equalisation (adjusting the relative overall amounts of treble and bass in a track), and change of tempo and/or pitch.
Some audio recognition techniques are based on directly carrying out a near-neighbour search on calculated hash values using a standard algorithm. Where the space being searched has a large number of dimensions, such standard algorithms do not perform very efficiently.
An article entitled “A Highly Robust Audio Fingerprinting System” by J. Haitsma et. al. of Philips Research, published in the Proceedings of the 3rd International Conference on Music Information Retrieval, 2002, describes a media fingerprinting system to compare multimedia objects. The article describes that fingerprints of a large number of multimedia objects, along with associated meta-data (e.g. name of artist, title and album) are stored in a database such that the fingerprints serve as an index to the meta-data. Unidentified multimedia content can then be identified by computing a fingerprint and using this to query the database. The article describes a two-phase search algorithm that is based on only performing full fingerprint comparisons at candidate positions pre-selected by a sub-fingerprint search. Candidate positions are located using a hash, or lookup, table having 32 bit sub-fingerprints as an entry. Every entry points to a list with pointers to the positions in the real fingerprint lists where the respective 32-bit sub-fingerprint are located.
However, there remains a need for an apparatus, system and method for more efficient and more reliable identification of audio media content.