1. Field of the Invention
This invention relates to coding of audio signals into a data stream such that it can be edited at points synchronised to another data stream. It has particular, but not exclusive, application to a digital television transmission scheme requiring non-destructive splicing of the audio in the compressed domain at the associated video frame boundaries.
Digital Television (DTV) systems allow several programmes to be broadcast over a channel of limited bandwidth. Each of these programmes has video and audio content. Some of these programmes may contain high quality multichannel audio (e.g., 5 channels that can be reproduced by home cinema systems). DTV production sites, networks and affiliates typically use video tape recorders and transmission lines for carrying all audio content. Much of this infrastructure has capacity for only two uncompressed audio channels, so multiple channels are normally lightly compressed and formatted before recording or transmission. Prior to emission (i.e., broadcasting to end-user) the programme streams are strongly compressed.
In contribution and distribution stages of DTV production, original streams must be spliced for programme editing or programme switching (e.g., for insertion of local content into a live network feed). Such splicing is performed at video frame boundaries within the content stream.
The audio content of the broadcast stream must meet several requirements. DTV viewers may expect received programmes to have a high perceptive audio quality, particularly when the programmes are to be reproduced using high quality reproduction equipment such as in a home cinema system. For example, there should be no audible artefacts due to cascading of multiple encoding and decoding stages, and there should be no perceptible interruption in sound during programme switching. Most importantly, the reproduced programmes must be lip sync; that is to say the audio stream must be synchronous with the corresponding video stream. To achieve these ends at a reasonable cost, i.e., using the existing (2-channel) infrastructure, one must splice the audio programme in the compressed domain.
2. Summary of the Prior Art
An existing mezzanine encoding scheme include Dolby E (r.t.m.) defined in Dolby Digital Broadcast Implementation Guidelines Part No. 91549, Version 2 1998 of Dolby Laboratories for distribution of up to 8 channels of encoded audio and multiplexed metadata through an AES-3 pair. The soon to be introduced (NAB 1999) DP571 Dolby E Encoder and DP572 Dolby E Decoder should allow editing and switching of encoded audio with a minimum of mutes or glitches. Moreover, they allow cascading without audible degradation. Dolby E uses 20-bit sample size and provides a reduction between 2:1 and 5:1 in bitrate.
The British Broadcasting Corporation and others are proposing, through the ACTS ATLANTIC project, a flexible method for switching and editing of MPEG-2 video bitstreams. This seamless concatenation approach uses decoding and re-encoding with side information to avoid cascading degradation. However, this scheme is limited to application with MPEG-2 Layer II and the AES/EBU interface. Moreover, the audio data is allowed to slide with respect to edit points introducing a time offset. Successive edits can result, therefore, in a large time offset between the audio and video information.
Throughout the broadcasting chain, video and audio streams must be maintained in lip sync. That is to say, the audio must be kept synchronous to the corresponding video. Prior to emission, distribution sites may splice (e.g., switch, edit or mix) audio and video streams (e.g., for inclusion of local content). After splicing, if video and audio frame boundaries do not coincide, which is the case for most audio coding schemes, it is not possible to automatically guarantee lip sync due to slip of the audio with respect to the video. In extreme cases, when no special measures are taken, this could lead to audio artefacts, such as mutes or glitches. Glitches may be the result of an attempt to decode a not compliant audio stream while mutes may be applied to avoid these glitches. An aim of this invention is to provide an encoding scheme for an audio stream that can be spliced without introducing audio artefacts such as mutes, glitches or slips.
Another aim of this invention is to provide an encoding scheme that can be subject to cascading compression and decompression with a minimal loss of quality.