The significant growth of the World Wide Web and improvements in the speed and bandwidth of telecommunication systems has led to a growth of the availability and transfer of videos. Due to the vast amount of information available, processes for identifying similar videos may be desirable. For example, a service provider may want to determine whether one video file is similar another video file. One method to do this is to use video signature schemes.
Current video signature schemes are divided into two categories. In one category, a single key frame is selected to represent a shot, and an image hash is taken of the single key frame to be used as a shot signature. The first category takes advantage of the image hash, of which the solution is well developed. However, one key frame from a shot may not sufficiently represent a whole shot, since temporal information is not used in deriving video signature.
In the second category, temporal information is used to derive video hash. Here, a 3D transform is generally performed, the coefficients of which are used as a video signature. The second category usually requires pre-processing to unify the whole video sequence in spatial and temporal scale before the 3D transform is performed. If the sequence is long, however, the sequence is subsampled, and useful temporal information is lost. Consequently, the derived signature may not be a good representative for the whole video.