Systems and methods of media content identification are known that employ so-called fingerprints extracted from the media content. For example, such systems and methods of media content identification can be used in video quality measurement systems to identify the video content for which the video quality is to be measured. In such systems and methods of media content identification, one or more fingerprints can be extracted from each of a plurality of reference video content items (such content items also referred to herein as a/the “reference content item(s)”), and stored in a database of reference content (such database also referred to herein as a/the “reference content database”). Moreover, one or more fingerprints can be extracted from a portion of query video content (such content also referred to herein as “query content”), and compared with the fingerprints stored in the reference content database. The query content can then be identified based on how well the fingerprints of the query content match the fingerprints stored in the reference content database. For example, fingerprints extracted from the query content or the reference content items can be suitable signatures or identifiers capable of identifying the video content.
In such known systems and methods of media content identification, the fingerprints extracted from the query content and the reference content items can be classified as spatial fingerprints or temporal fingerprints. For example, in the case of video content, one or more spatial fingerprints can be extracted from each video frame of the query content or the reference content items independent of other video frames included in the respective video content. Further, one or more temporal fingerprints can be extracted from two or more video frames of the query content or the reference content items, based on their temporal relationship within the respective video content. Because performing media content identification based solely on spatial fingerprints from a limited number of video frames can sometimes result in incorrect identification of the video content, such systems and methods of media content identification typically seek to enforce a temporal consistency of the results of fingerprint matching to improve the identification of such video content. For example, a shorter term temporal consistency can be enforced by matching the spatial fingerprints of video frames within a temporal window of the video content, and a longer term temporal consistency can be enforced by performing temporal fusion on the results of spatial fingerprint matching.
However, such known systems and methods of media content identification have several drawbacks. For example, such systems and methods of media content identification that seek to enforce a temporal consistency of fingerprint matching can be computationally complex. Further, such systems and methods of media content identification that perform temporal fusion to enforce such a temporal consistency typically use the results of fingerprint matching for a batch of video frames, significantly increasing memory requirements. Such systems and methods of media content identification are therefore generally unsuitable for use in applications that require real-time fingerprint matching against a large database of reference content. Moreover, due to at least their computational complexity and/or increased memory requirements, such systems and methods of media content identification are generally considered to be impractical for use in identifying query content at an endpoint, such as a mobile phone or device.
It would therefore be desirable to have improved systems and methods of media content identification that avoid at least some of the drawbacks of the various known media content identification systems and methods described above.