The proliferation of digital information has created a new television industry employing the concept of a "digital studio", e.g., the HDTV (High Definition Television) or SDTV (Standard Definition television) broadcast studio. A digital studio is an environment or system having numerous components where various sources of digital information can be selectively accessed, manipulated and delivered (in real time or in delay mode) to multiple clients.
Currently, a digital studio is required to produce an output data stream that meets the specifications set forth in the ATSC (Advanced Television Systems Committee) Digital Television Standard and the MPEG-2 systems level standards as set forth in ISO/IEC 13818-1 recommendation H.222.0. The digital studio is required to dynamically switch between various program sources and to produce a compliant output stream. Program sources include, but are not limited to, file servers, tape players, encoders, satellite links, networks and other sources capable of digital storage or transmission, where these sources may contain either pre-recorded or "live" data streams. The digital studio may incorporate a switcher, e.g., a Play-To-Air Switcher, to switch, multiplex or splice the various data streams into a single output stream.
Typically, each data stream, when in transport format, carries a plurality of audio and video data streams (substreams), e.g., MPEG system layers define Packetized Elementary Streams (PES) which may carry encoded audio and video streams. Furthermore, MPEG provides a mechanism for time stamping the individual elementary stream components of a program with Presentation Time Stamps (PTS) in the PES layer for time synchronization between the video and audio components (program components) at the time of origination.
However, the presentation time of the various program components are not synchronous to each other but are synchronized to the system clock, e.g., a 27 MHz reference clock. Specifically, the audio and video presentation units have different durations. An audio presentation unit or frame is fixed at 32 msec, while the video presentation unit or frame varies with video format and is not fixed at 32 msec. Maintaining synchronization between the video signal and the associated audio signal is vital in providing high quality presentations, i.e., "lip sync". Lip sync is the synchronization of audio and video presentation, e.g., the synchronization of a soundtrack consisting of dialogue, music, and effects with the pictures of a program.
This requirement creates a problem when switching from one program to another program during a splicing or switching operation. The video and audio units are typically not aligned in the time domain. Namely, the presentation of a video unit may not coincide exactly with the presentation of an audio unit in the time domain, e.g., the audio signal may continue for a short duration after the display of the associated video signal. Thus, switching encoded data streams, e.g., at either a video or an audio "access unit" (a coded representation of a video or an audio presentation unit) creates a partial access unit in the other associated elementary stream that was not aligned at the switch point, e.g., aligning the video access units of two data streams may cause overlap of their audio access units and vice versa.
To illustrate, if the alignment of the video streams are used to control the switch point such that no video discontinuity occurs, the audio from the stream before the switch point may have an access unit that continues into the next video frame. Upon splicing, the audio access unit from the stream following the switch point may then overlap the audio access unit from the stream prior to the switch point.
However, if one attempts to align both the video and the audio by creating a continuous flow of access units for both video and audio, the audio to video time relationships are disturbed causing them to loose synchronization.
Therefore, a need exists in the art for a method and apparatus for preserving audio/video lip sync when splicing data streams from multiple sources.