For audio coding, transmission and decoding, in particular for Internet applications, e.g. the audio coding standards ISO/IEC 11172-3, Layer III, ISO/IEC 13818-3, Layer III (MPEG audio layer III) and ISO/IEC 13818-7 are used for data re-duction. A widely used abbreviation for such type of coding/transmission/decoding is ‘mp3’.
A common feature of these and other well-known audio coding standards is that the encoded data are formatted into a sequence of fixed-length data frames to be transferred as data streams or to be stored as data files. Every frame contains data for a certain temporal length (e.g. 24 ms) of a section of the original audio signal. The data frames include headers, data fields with particularly important information (side information), data fields with strongly variable information (main information) and, in many cases, a remaining data field without generally defined information. The latter data field is not specifically defined in the ISO/IEC standards, is denoted as ‘ancillary data’ and can be utilised freely for various purposes.
A reason for having in the data frames data fields without particular information is that the amount of information initially coded for a data frame varies strongly depending on the current characteristic of the original audio signal and—although the coder control basically aims at outputting a constant data rate per data frame—never contains an amount amount of finally encoded data corresponding exactly to the fixed length of the data frame. In other words, one of the tasks of an encoder is controlling the encoding such that the encoded data just fits into the frames at a given total data rate (and thereby absolute length of a data frame in bits). This goal is usually tried to be achieved by adapting the encoding quality, e.g. the level of the coarseness of the quantisation. However, by these means the encoder cannot only be ordered to persistently try to fill but not to overload the data frames, but can also be ordered to persistently keep at least a certain amount of data per data frame for ‘ancillary data’.
FIG. 1 shows the typical configuration of an mp3 data stream with a plurality of frames n . . . n+3. Each frame starts with a header and a side information field. Side information is e.g. sampling frequency, scale factors, quantisation information and stereo/mono information. The main information fields 1 to 4 contain the coded audio signal coefficients. The side information fields can also contain pointers ‘main_data_begin’ stating the address of the corresponding first bit of the main information field. Also depicted are the positions for the ‘ancillary data’ fields that, if present, follow the main information field.
FIG. 3 illustrates the basic structure of an audio encoder that provides a bitstream according to FIG. 1. Input audio samples are fed into the encoder. The mapping and filter bank stage MFB creates a filtered and subsampled short-term frequency domain representation of the original input signal, i.e. transformed subband samples or coefficients. A psychoacoustic model stage PAMC is used for calculating a set of data (e.g. the signal to mask ratio) to control the bit allocator/quantiser and coder stage BAQC. The bitstream formatter BF assembles the actual bitstream from the output data of the other blocks, adds other information (e.g. error correction) and forms constant-length data frames.
THOMSON multimedia and Coding Technologies have recently introduced the ‘mp3PRO’ format that is an extension of the mp3 format. The additional mp3PRO data required are transferred as ‘ancillary data’ in the corresponding data frame fields. The encoded mp3PRO bitstreams are compatible with encoded mp3 bitstreams so that older mp3 players or decoders can easily decode and reproduce the mp3PRO bitstreams or files by not making use of the ‘ancillary data’.