The present invention relates to three-dimensional (3D) television (TV) video processing, and more particularly to the measurement of stereoscopic video temporal frame offset.
Stereoscopic three-dimensional television (3DTV) video is a sequence of stereoscopic frame-pairs. One of the frames in each frame-pair is intended for viewing by the Left Eye only, and the other frame is intended for viewing by the Right Eye only. In this manner the binocular vision of the frame-pair creates the stereoscopic illusion of depth along with the traditional image height and width. These frame-pairs may be separate video streams of the Left Eye (L) image frames and the Right Eye (R) image frames. Each stream is generally taken from a 3D camera system that is really two cameras with separate video outputs to produce a serial digital dual link. It is desirable that the two cameras are synchronized so that the L and R frame-pairs are captured at the same time.
These dual-link outputs may be sent as separate compressed video streams to a location where the image pairs are combined with one of several methods to produce a single stream for distribution. Each stream may be compressed with a separate coder/decoder (CODEC), one for the L image sequence and one for the R image sequence. However each CODEC often has an undetermined frame delay or processing latency. As a result the decompressed output sequence of L and R frames may no longer be pair-wise, frame synchronous, i.e., there may be one or more frames of temporal miss-alignment or temporal frame offset.
U.S. Pat. No. 6,751,360, issued Jun. 15, 2004 to Jiuhuai Lu and assigned to Tektronix, Inc. of Beaverton, Oreg. and incorporated herein by reference, describes a fast temporal alignment estimation method for temporally aligning a distorted video signal with a corresponding source video signal. A temporal signal curve (SC) is created for each of the video signals, and the resulting SCs are cross-correlated with each other to determine a match between corresponding frames of the two video signals. The maximum cross-correlation result is an indication of the amount of temporal displacement between corresponding frames of the two video signals. However the resulting SC, as shown by the SC graph of FIG. 1a, has a large offset. The large offset does not allow robust determination of the frame offsets from the SCs and requires a large bit-size for digital integer implementation in hardware (HW).
What is desired is a method, without any a priori knowledge of video content, of measuring the frame offset or L to R temporal miss-alignment, which measurement provides a robust indication of any uncorrected temporal frame offset between the L and R image sequences.