MXF (Material exchange Format) is a file format for the interchange of audio-visual (AV) material with associated data and metadata, providing a wrapper for these data. It is packet-based and can be used e.g. to store data with associated metadata, store files in streamable format, i.e. a format that allows viewing while transferring, or wrap any compressed or uncompressed data.
Various applications, e.g. professional video cameras for digital cinematography, record or handle uncompressed video. Based on the video raster or frame rate and the colour resolution of each pixel, the amount of information needed for each frame in progressive systems, or each field in case of interlaced systems, is constant. This applies also if constant size headers are added, as done e.g. in MXF wrapping.
In addition to the video information, i.e. the actual picture item, other information is stored together with each video frame or field respectively. This information may comprise e.g. a system item containing information about e.g. the video raster, a sound item containing audio information coming along with the video, and a data item containing any kind of metadata, in particular structural metadata as opposed by descriptive metadata. The sequence of these four items, i.e. system item, picture item, sound item and data item, with a corresponding header contains the information of one frame or field and is within an MXF file repeated for every new incoming frame or field. Thus, the picture item with a constant amount of data has a constant duration, defined by the employed frame rate and video raster, e.g. 1/24 seconds per frame. The associated sound item has also a constant duration that is defined by the audio sample rate, e.g. 1/96000 sec per sample for 96 kHz audio sample rate. In this case, an amount of 96000/24=4000 audio samples belongs to each frame.
Since various standard video frame rates and audio sample rates are defined and can be combined independently, the amount of audio samples per video frame or field may vary.
For some combinations of video frame rates and audio sample rates however the ratio is a non-integer value. Conversion between different video systems may lead to non-integer frame rates. E.g. a video frame rate of 29.97 fps (frames per second) may be employed, corresponding to a frame duration of 1/29.97 sec. In this case the number of audio samples that match this duration is 96000/29.97=3203.203 . . . being a non-integer value. A common solution for adjusting audio and video data is to distribute the audio samples over a sequence of consecutive frames, thus achieving an average audio rate over multiple frames. But the amount of audio samples, and thus audio data, per frame is then varying, and therefore the frames have variable sizes.