Audio and video frame rates (or frame frequencies) used in most commercial applications available today follow separate established industry standards manifesting themselves both in recording and playback software products, hardware components as well as agreed formats for transmitting audio and video between communicating parties. Audio frame rates are typically specific to different coding algorithms and associated with audio sampling frequencies, such as 44.1 and 48 kHz, which are as notorious as the video frame rates 29.97 fps (NTSC) and 25 fps (PAL) in their respective geographical areas; further standard video frame rates include 23.98, 24 and 30 fps, or in a more generalized form 24, 25, 30 fps and (24, 25, 30)×1000/1001 fps. Attempts to unite or harmonize audio frame rates have not yet been successful despite the shift from analogue to digital distribution, which implies that an audio frame (e.g., a packet or a coding unit suitable for transmission over a network) does not in general correspond to an integer number of video frames.
The need to synchronize audiovisual data streams arises continually, as a result of clock drift or when several streams are received from different sources for common processing, editing or splicing in a server, a situation frequently encountered in broadcast stations. In the situation illustrated in FIG. 3, audio frames (A11, A12, . . . in stream S1 and A21, A22, . . . in stream S2) and video frames (V11, V12, . . . in stream S1 and V21, V22, . . . in stream S2) do not match, an attempt to improve the video-to-video synchronicity between the streams by duplicating or rejecting video frames in one of the streams (in an attempt to e.g. splice the streams) typically leads to an audio-to-video asynchronicity within that stream. In general, the asynchronicity persists—at least to some extent—even if corresponding audio frames are deleted or duplicated.
At the cost of more processing, a larger room for maneuver could be created by temporarily decoding the audio during synchronization into a low-level format that is independent of the division into frames, e.g., baseband format, or pulse-code modulation (PCM) resolved at the original sampling frequency. Such decoding however blurs the exact anchoring of metadata to specific audio segments and creates an information loss that cannot be remedied by decoding into a ‘perfect’ intermediate format. As one example, dynamic range control (DRC) is typically mode-dependent and equipment-dependent, and can therefore be consumed only at the moment of actual playback; a data structure governing the characteristics of DRC throughout an audio packet is difficult to restore faithfully after synchronization has taken place. Hence, the task of preserving metadata of this type past consecutive decoding, synchronization and encoding stages is no simple task if subjected to complexity constraints.
Even more serious difficulties may arise in connection with legacy infrastructure that is designed to carry two-channel PCM signals and is therefore capable of handling multi-channel content only in coded form.
It is certainly more convenient to encode audio and video data frame-synchronously in the sense that data in a given frame exactly correspond to the same time segment in the recorded and coded audiovisual signal. This preserves audio-to-video synchronicity under frame-wise manipulation of an audiovisual stream, i.e., duplication or rejection of one or more entire independent coding units in the stream. The frame lengths available in the Dolby E™ audio format match video frame lengths. With a typical bit rate of 448 kbps, this format was designed primarily for the purpose of professional production, with hard media like digital videocassettes as its preferred storage modality.
There is a need for an alternative audio format suitable for distribution purposes as part of a frame-synchronous audiovisual format (or format family), as well as coding and decoding equipment suitable for use therewith.
All the figures are schematic and generally only show parts which are necessary in order to elucidate the invention, whereas other parts may be omitted or merely suggested. Unless otherwise indicated, like reference numerals refer to like parts in different figures.