A rapidly increasing number of videos are being stored by professionals and consumers. For professionals, the cost and technical difficulties of creating, storing and editing video content have been decreasing. For consumers the increase in choice and decrease in cost of set-top boxes, personal video recorders, video cameras and computers has driven an increase in video content. Over the past couple of years there has been an explosion in both legal and illegal content available on the Internet. The ability to index, search and monitor this content has become an increasingly important problem. The MPEG-7 standard was an early work in the area of content based search and retrieval. An area lacking in the original version of the standard is near-duplicate video detection.
Near duplicate video detection can be defined as follows: given a query video sequence, find all of the duplicates in a database. The notion and interpretation of (near-)duplicates varies. However, for this invention a duplicate is regarded as a sequence that has been created by using common video editing/processing operations on an original. Examples of such operations include colour change, compression, transcoding, format change, frame rate change, analogue VCR recapture and camera recapture amongst many more. The present invention also addresses the problem that the duplicate part may form only a part of the query sequence.
In previous work in the area [T. Hoad and J. Zobel. Video similarity detection for digital rights management. In Proceedings of Australasian Computer Science Conference, pages 237-245, Adelaide, Australia, 2003.] shot cuts and boundaries were used to form a signature of a video sequence. This provides a very compact representation of a video, but it performs very poorly on short sequences and is very sensitive to the shot-detection algorithm used [T. Hoad and J. Zobel., Proceedings of the 5th ACM SIGMM international workshop on Multimedia information retrieval, pages 262-269, Berkeley, US, 2003.]
A typical state-of-the-art feature-point approach to (near-)duplicate detection in video is given in [J. Sivic, A. Zisserman, Efficient Visual Search for Objects in Videos, Proceedings of the IEEE, April 2008, 96 (4), pages 548-566.] and can be outlined as i) detect key frames ii) detect key-points in the frame, iii) extract features from regions around the point, iv) match sequences using features, v) apply test for spatial cohesion of objects in sequences. There are a number of weaknesses with the approach outlined. Firstly the use of key frames means the method is likely to perform less well on short clips. The extraction of features (iii) is a computationally expensive method and results in large storage requirements. For step (iv) a visual vocabulary is used which is learned from clustering data. This can lead to over-fitting to a particular dataset with failure to generalise. Related methods such as [Ond{hacek over (r)}ej Chum, James Philbin, Michael Isard and Andrew Zisserman, Scalable near identical image and shot detection, Proceedings of the 6th ACM international conference on Image and video retrieval, pages 549-556, Amsterdam, The Netherlands, 2007] provide fast searching at the cost of high memory requirements for the hash tables used. Whilst this may be suitable for some scenarios it is not suitable in consumer electronics environments where memory resources are typically very limited.
The present invention aims to address at least one or more of the limitations of such prior art methods.