Multimedia fingerprints (also commonly referred to as signatures or robust hashes) have been proposed as a way to uniquely identify multimedia content (audio, images and video). A typical fingerprinting method applies signal processing techniques to multimedia content to extract descriptors. These descriptors may represent spatial information at temporal sample points, temporal information and/or spatio-temporal. The descriptors are typically high dimensional vectors of features that may be quantised to a small number of values, e.g. two (binary), three (ternary), four (quaternary), etc. The descriptors may also be projected into some lower dimension space, for example by Singular Value Decomposition (SVD). The important characteristics used to differentiate between different multimedia fingerprinting methods include uniqueness, robustness, descriptor size, searching speeds and temporal granularity.
Video fingerprints extracted by previous methods either sample every frame, sample at some known period or sample at temporal locations believed to have some significance (e.g. key frames). Sampling at every frame leads to large fingerprints, making storage and transmission costs high. Sampling at lower intervals leads to smaller fingerprints; however there is a loss in the temporal granularity that the method is able to achieve. It will be clear to those skilled in the art that similar limitations exist for audio fingerprints.
Lossless encoding schemes typically employ some combination of run-length coding and variable length prefix coding. For instance, lossless encoding has application in the coding of fax machine messages and compression of digital image file formats. The prefix property of the coding system refers to the fact that there is no codeword with a prefix that is equal to some other codeword. The Huffman codes constitute one particular example, where the codeword length is chosen adaptively, in accordance with the probability of the encoded symbol. To achieve the optimal entropy limit for a specific set of symbols, the codewords need to be of length li=−log2pi, where pi is the probability of the i-th symbol. However, when the probabilities are known by the encoder (transmitter), but not by the decoder (receiver), Huffman coding requires an overhead of signalling assignment of a particular codeword to a particular symbol. This may be done with a small number of bits if several predetermined probability tables are used, so only an index to a table is transmitted. Another variant of variable length coding, arithmetic coding, can achieve the optimal codeword length for the case where the probabilities of symbols correspond to non-integer codeword lengths. Arithmetic coding is in general more complex than Huffman coding, which is in turn more complex than the universal codes, where the set of the codewords is fixed. Universal codes have the property that for monotonically decreasing distributions (pi≧pi+1) the expected codeword lengths are longer by a constant factor than the optimal codeword lengths. One commonly used universal code is Exponential-Golomb (also known as Exp-Golomb), that performs well for exponential probability distributions that have wide tails (relatively large probability for symbols with large index i). The codes are parameterised by a non-negative integer s, with the codewords of length li=1+2└ log2(i+2s)┘−s. The first eight codewords for s=0, 1, 2 are shown in Table 1 below.
TABLE 1Exponential-Golomb codewords and the correspondingbit-lengthsis = 0s = 1s = 20  1(1) 00(2) 000(3)1 010(3) 01(2) 001(3)2 011(3)1000(4) 010(3)300100(5)1001(4) 011(3)400101(5)1010(4)10000(5)500110(5)1011(4)10001(5)600111(5)110000(6) 10010(5)70001000(7) 110001(6) 10011(5)Run-length coding represents runs of encoded symbols with a single representation of the symbol, followed by the count (run-length) of that symbol. Run lengths themselves may be entropy coded by any of the above mentioned methods, selection of which would depend on the underlying probability distribution.