Audio and video frame rates (or frame frequencies) used in most commercial applications available today follow separate established industry standards manifesting themselves both in recording and playback software products, hardware components as well as agreed formats for transmitting audio and video between communicating parties. Audio frame rates are typically specific to different coding algorithms and associated with specific audio sampling frequencies, such as 44.1 and 48 kHz, which are as notorious as the video frame rates 29.97 fps (NTSC) and 25 fps (PAL) in their respective geographical areas; further standard video frame rates include 23.98, 24 and 30 fps, or in a more generalized form 24, 25, 30 fps and (24, 25, 30)×1000/1001 fps. Attempts to unite or harmonize audio frame rates have not yet been successful despite the shift from analogue to digital distribution, which implies that an audio frame (e.g., a packet or a coding unit suitable for transmission over a network) in general does not correspond to an integer number of video frames in an audiovisual data stream.
The need to synchronize audiovisual data streams arises repeatedly, as a result of clock drift or when several streams are received from different sources for common processing, editing or splicing in a server, a situation frequently encountered in broadcast stations. An attempt to improve video-to-video synchronicity between two audiovisual data streams by duplicating or dropping video frames in one of the streams (e.g. to prepare the streams for splicing) typically leads to an audio-to-video lag within that audiovisual data stream in case the sizes of the audio frames and the video frames do not match. In general, a lag persists—at least of some non-zero duration—even if audio frames corresponding to the video editing are deleted or duplicated.
At the cost of more processing, a larger room for maneuver could be created by temporarily decoding the audio during synchronization into a low-level format that is independent of the division into frames, e.g., baseband format, or pulse-code modulation (PCM) resolved at the original sampling frequency. Such decoding however blurs the exact anchoring of metadata to specific audio segments and creates an information loss that cannot be remedied by decoding into a ‘perfect’ intermediate format. As one example, dynamic range control (DRC) is typically mode-dependent and equipment-dependent, and can therefore be consumed only at the moment of actual playback; a data structure governing the characteristics of DRC throughout an audio packet is difficult to restore faithfully after synchronization has taken place. Hence, the task of preserving metadata of this type past consecutive decoding, synchronization and encoding stages is no simple task if subjected to complexity constraints.
Even more serious difficulties may arise in connection with legacy infrastructure that is designed to carry two-channel PCM signals and is therefore capable of handling multi-channel content only in coded form.
It is certainly more convenient to encode audio and video data frame-synchronously in the sense that data in a given frame exactly correspond to the same time segment in the recorded and coded audiovisual signal. This preserves audio-to-video synchronicity under frame-wise manipulation of an audiovisual stream, i.e., duplication or rejection of one or more entire independent coding units in the stream. The frame lengths available in the Dolby E™ audio format match video frame lengths. With a typical bit rate of 448 kbps, however, this format was designed primarily for the purpose of professional production, with hard media like digital videocassettes as its preferred storage modality.
In the applicant's co-pending, not yet published application PCT/EP2014/056848, systems and methods are proposed which are compatible with an audio format suitable for distribution purposes as part of a frame-synchronous audiovisual format.
There is a need for an alternative audio format suitable for distribution purposes as part of a frame-synchronous audiovisual format, with improved scaling behaviour for high frame rates. There is also a need for coding and decoding equipment suitable for use therewith.
All the figures are schematic and generally only show parts which are necessary in order to elucidate the invention, whereas other parts may be omitted or merely suggested.