The present application is concerned with audio splicing.
Coded audio usually comes in chunks of samples, often 1024, 2048 or 4096 samples in number per chunk. Such chunks are called frames in the following. In the context of MPEG audio codecs like AAC or MPEG-H 3D Audio, these chunks/frames are called granules, the encoded chunks/frames are called access units (AU) and the decoded chunks are called composition units (CU). In transport systems the audio signal is only accessible and addressable in granularity of these coded chunks (access units). It would be favorable, however, to be able to address the audio data at some final granularity, especially for purposes like stream splicing or changes of the configuration of the coded audio data, synchronous and aligned to another stream such as a video stream, for example.
What is known so far is the discarding of some samples of a coding unit. The MPEG-4 file format, for example, has so-called edit lists that can be used for the purpose of discarding audio samples at the beginning and the end of a coded audio file/bitstream [3]. Disadvantageously, this edit list method works only with the MPEG-4 file format, i.e. is file format specific and does not work with stream formats like MPEG-2 transport streams. Beyond that, edit lists are deeply embedded in the MPEG-4 file format and accordingly cannot be easily modified on the fly by stream splicing devices. In AAC [1], truncation information may be inserted into the data stream in the form of extension_payload. Such extension_payload in a coded AAC access unit is, however, disadvantageous in that the truncation information is deeply embedded in the AAC AU and cannot be easily modified on the fly by stream splicing devices.