1. Field
Embodiments generally relate to video stream processing and, more particularly, to a method and apparatus for indexing a video stream.
2. Description of the Related Art
Video stream processing techniques are becoming increasingly popular. In many instances, proper alignment between two video streams is an important requirement for the accurate stream processing. For example, non-linear video editors (NLEs) use video stream alignment to form a continuous stream from two or more constituent streams where each of the constituent streams may be created from different video sources. Typically, the sources are cameras viewing a scene from different angles and the NLE seamlessly combines the constituent streams to facilitate a transition from one viewpoint to another. To achieve a seamless stream combination, common content in the constituent streams is found, aligned and combined to transition from one stream to another. Other applications that require video stream alignment are fingerprinting of video for indexing, searching and retrieval purposes, detecting unauthorized display of copyrighted video content, determining video watermarks and/or the like. In each of these applications, a technique is used to determine common content between at least two streams, and then compare the content or transition from stream to stream at the common content location. These applications generally require a robust alignment solution that operates well when faced with a variety of video degradations such as compression, blurring, affine transformation and global changes within the stream to the intensity, colors and contrasts. For facilitate development of practical applications, the alignment technique also needs to be computationally inexpensive.
There are conventional techniques for content based video synchronization and combining, but such techniques primarily utilize complex computational resources and consume a significant amount of time to complete the alignment. For example, salient points in each frame of a video stream may be identified using such conventional techniques as SIFT (Scale Invariant Feature Transform), SURF (Speeded Up Robust Features), DAISY, and Harris corner processing. Next, in a frame-by-frame manner, the salient points are compared between streams until a frame match is found. The streams can be aligned (synchronized) at the common content frame or frames and further processing may be performed. However, this is a computationally expensive procedure and may not be of practical use in most applications.
Therefore, there is a need in the art for an improved method and apparatus for indexing a video stream to facilitate stream comparison.