The present invention relates to digital television. More particularly the present invention relates to method and system for inhibiting audio-video synchronization delay, for fast initiation of display of interleaved video and audio by a decoder, suitable, for example, for fast channel switching between channels broadcasting interleaved video and audio data.
Television (TV) viewers often switch between broadcast channels (sometimes this action is named “zapping”). In past days of analogue TV the response to a switching command (input directly or using a handheld remote control) was practically immediate.
Digital Television (DTV) technology has introduced new benefits and challenges, but channel switching involves delays.
Broadcast channel switching in the DTV realm is implemented, for example, in Internet Protocol (IP) networks using a multicast Internet Group Management Protocol (IGMP) and by Switch Digital Broadcast (SDB) in Hybrid Fiber Coax (HFC) networks. The switched broadcast concept, both in IP and HFC networks, introduces a delay in reception of the newly switched channel as data is forwarded to the receiver.
DTV is closely associated with compression of video. In some compressed video formats, such as for example, in the MPEG format, a Video Elementary Stream (VES) is subjected to GOP (Group Of Pictures) encoding. To deal with temporal redundancy, MPEG divides the frames into groups, each referred to as a “group of pictures,” or GOP. A VES is made up of I, P and B type pictures. An I picture (I stands for Intracoded picture) contains information of a whole new frame and is used as reference in the reconstruction of either P or B pictures, whereas a P (P stands for Predicted picture) picture contains information on several consecutive intermediate frames sharing information from the I picture. A P picture supports forward prediction from a previous picture. A B picture (B stands for Bi-directional prediction picture) contains only information of a single intermediate frame. A B picture is a forward, backward or bi-directional picture, referring to other I and P pictures.
Due to the abovementioned video compression characteristics start of playback is possible only at specific points along the compressed video stream, when an intra-coded picture (I picture) is received, otherwise motion estimation dependent information will be lost, causing artifact macro blocks to appear on the screen.
The GOP structure that only allows starting playback at the beginning of a GOP introduces a delay in a channel change operation (so-called “zapping”) due to the fact that the decoder has to wait for the beginning of a GOP to be received before it can start playback. The average delay is half the GOP size, which typically spans a few seconds. The better the compression is the longer the GOP size, introducing greater channel change delays.
In addition to the delay caused by the GOP structure, there is an additional delay introduced in the multiplexing process. The multiplexer typically generates an interleaved video and audio stream, with synchronized video and audio. The multiplexer refers to the atomic component consisting of video picture and audio signal as Access Unit (AU), a different name may be used to describe an atomic component with reference to different video-audio formats. For brevity the term “access unit” is used in the present application to refer to any such atomic component.
The multiplexing process takes into account the fact that compressed video picture size varies, thus the multiplexer generates a video buffer to cope with picture size variations. On the other hand audio may be characterized by a constant bitrate and does not require a large buffer compared to video. As a result, audio AUs in the interleaved video and audio data are delayed until the video buffer is full enough. When the de-multiplexer at the end-user playback device starts de-multiplexing the interleaved stream, it must wait until the audio information corresponding to the already-received video pictures arrives, and only then the matched video and audio data can be forwarded to the video and audio decoders. The described process introduces addition delay called Audio/Video synchronization delay (or A/V synch delay for short) delay. The A/V synch delay may reach a few seconds.
A known approach to solving the A/V synch delay is based on full transcoding of the entire stream. This approach tries to reduce picture size variation attributed to encoding so as to reduce the required video buffer, and by that reducing the delay between the corresponding video and audio data.
Another approach to solve the A/V synch delay is based on playing the video in slow motion to allow the audio information to catch up with the video information, until audio and video are synchronized. This method enables the video buffer to be filled while video is immediately shown on screen in slow motion.