1. Field of the Invention
The present application is related to content stream fingerprinting and, more particularly, to matching fingerprints to identify the content being streamed.
2. Description of the Background Art
One technique for identifying digital content (e.g., video, audio, or other media content) in the absence of dedicated identifying information (e.g., metadata) is known as fingerprinting. Fingerprinting involves deriving a small piece of data, known as a fingerprint, from the original content. The fingerprint identifies the source content. Generally, no two fingerprints are the same. Fingerprints are useful for determining whether one item of content is the same or different from another item of content by comparing their respective fingerprints, rather than comparing the data of the content items themselves. The small size of fingerprints makes them efficient to store in bulk and efficient to compare against other fingerprints.
Fingerprinting has drawbacks, however. The initial generation of fingerprints from the original content is a processing intensive task, and is typically more processing intensive than comparing fingerprints. As a result, a fingerprinting system can be overloaded if it must rapidly generate fingerprints for a large number of content items. Further, even though fingerprint comparison between two fingerprints is comparatively efficient from a processing standpoint, even fingerprint comparisons can be processor intensive if there are a large number of comparisons to be performed. These problems may be mitigated by adding more fingerprinting systems to perform all of the necessary processing, however this solution adds significant costs to a fingerprinting system.
For these reasons, fingerprinting systems tend to work best where the fingerprints can be generated and matched offline. This reduces the need for increased hardware to support to support fingerprinting volume. In addition, fingerprinting systems also benefit where the fingerprints can be generated and then stored and used multiple times to identify content. In this case, the processing costs of fingerprint generation are be mitigated by the fact that the fingerprints are be reused multiple times and do not need to be regenerated.
There are many situations, however, where it is useful to identify the content in a real time data stream. For example, it may be desirable to monitor a real time content stream being received by user to identify what programs the user is watching. One current approach to this problem relies on sampling the content being captured in the user's viewing environment, and transmitting the sample back to a fingerprinting system at a remote server, which then generates the fingerprints from the samples, and compares them with reference fingerprints of reference content. As can be appreciated, in large scale systems there may be millions of users receiving content streams that would need to be monitored, the samples received, and fingerprints generated by the server in this manner. Such a system would require a very large number of fingerprinting systems at the server to accommodate the expected fingerprinting volume.