Audiovisual data, for example broadcast television or on demand movies, is extremely large in size. Hence, methods of digitally compressing audiovisual data have been developed, for example Motion Picture Experts Group (MPEG)-2 and MPEG-4 Part 10/H.264.
A basic compression system 100, as deployed in broadcast television, consists of an encoder (compressor) and a decoder (de-compressor) as illustrated by FIG. 1.
In FIG. 1, the video to be encoded is inputted into a video encoder 110, which produces an elementary stream of digitally compressed video data 130. The encoder includes a Presentation Time Stamp (PTS) insertion unit 115, which inserts PTS's into the video elementary stream, according to the requirements of the digital video encoding standard in use with the system 100 (e.g. MPEG-4).
Meanwhile, the audio corresponding to the video is inputted into an audio encoder 120, again having a PTS insertion unit 125, to produce an elementary stream of digitally compressed audio data 135.
A multiplexing unit 140 multiplexes the video and audio elementary streams together to form a Transport Stream 145, ready for transmission to a receiver 155 through a network 150 or stored on a storage device for later transmission.
A typical receiver 155 will comprise the units required to carry out the reverse process of decoding. In particular, there is a de-multiplexer 160, which provides the de-multiplexed elementary streams to the video decoder 170 and audio decoder 180 respectively. To enable the PTS values to be used, a PTS detection unit (175/185) is generally included in each audio or video decoder.
The bit stream 145 out of the multiplexer 140 may be stored at several stages within the network 150 and/or may also be re-multiplexed several times with other audiovisual data streams.
The inputs to an encoder (110, 120) are generally uncompressed audio or video data, and corresponding pairs of audio and video encoders (110, 120), operating on the same source audiovisual material, are expected to have source material that is correctly synchronized at source.
However, for various reasons, it is possible for these inputs to arrive at their respective encoder with relative delays, so that the audio and video of a corresponding pair of encoders are not always correctly synchronized.
Furthermore the encoding systems themselves introduce delays which are not necessarily the same for both the video and audio, because the audio and video are separated during both the encode and the decode processes (as shown in FIG. 1).
Moreover, the decoder and encoder are also geographically separated. This separation may lead to the audio and video no longer being aligned when they reach the viewer, despite their arrival at the respective encoders in adequately synchronized form.
Ideally the encoder system would be able at least to detect this mis-alignment, measure its value and then correct it, but no industry agreed method exists for this and the various manufacturers of compression systems and their customers who operate them use a variety of means to deal with the problem of audiovisual synchronisation—from doing nothing, to regular system calibration. Nevertheless, no fully satisfactory method currently exists to assure operators of correct audiovisual alignment at all times.
One simplistic way to check the alignment of audio and video at the output of a compression system is to use a known pair of video and audio signals. One example of such a scheme is to use a “flash” and “beep” signal, which is a derivative of the well known method used in the film industry of a clapper board that links a given film frame with a well defined sound.
Such a scheme is described in British Patent 2,341,758 A. The marking of a given video frame with a flash and simultaneously the audio with a short “beep” enables a measure of how far apart the audio and video are at the output of a test decoder.
However, this is clearly an inconvenient method as it has to be done off-line (i.e. whilst the system is not broadcasting to users) either manually, or using a PC package that automates this process.
There are several audiovisual (AV) synchronization software measurement packages available, but these still rely on decoding the compressed audio and video before measuring the audiovisual synchronization and the decoding step itself may affect the result. These methods only measure the relative synchronization, which clearly requires manual intervention to perform the measurement and then take any remedial steps to correct any defect.
One key issue is to establish whether the audio and video are misaligned during the encoding processes themselves, for example due to a lack of calibration of the respective encoders, or whether the misalignment is due to mis-aligned signals arriving at the encoders. Whilst the former can be corrected by proper calibration of the encoders under test, the latter cannot be corrected without an escalation of the measurement to the preceding transmission system.
To be able to check the audiovisual synchronization at the output of an encoder, a decoder is required, but a decoder (hardware or software) will also separate the audio and video components and may introduce its own audiovisual synchronization error. This means that the overall system audiovisual synchronization is being measured, and not that of just an encoder, and it is impossible to isolate how much of the audiovisual synchronization error is due to the encode process, or the decoder. Accordingly, it would be desirable to have a method and apparatus for testing audiovisual synchronisation that can measure the delay of the encoder only, and which works whilst the broadcast/transmission system is online (i.e. operating to provide audiovisual data to end viewers).
Furthermore, once compressed, the data stream from the encoder system typically uses the MPEG-2 Transport Stream standard (ISO/IEC 13818-1: Information technology—Generic coding of moving pictures and associated audio information: Systems) to convey the combined compressed audio and video data, as well as other related signals, to the decoder. This MPEG-2 Transport Stream standard is also used in more recent video encoding standards, such as H.264.
Accordingly, it would also be desirable to have a method of estimating the state of audiovisual synchronization within a Transport Stream, because it would negate the need to use a separate offline physical or PC based decoder during measurement.