1. Field of the Invention
The present invention relates to audio data streams coding audio signals and, more specifically, to a better manipulation of audio data streams in a file format where the audio data associated to a time mark can be distributed among different data blocks, such as is the case in MP3 format.
2. Description of the related art
MPEG audio compression is a particularly effective way to store audio signals, such as music or the sound for a film, in digital form while requiring, on the one hand, as little memory space as possible and, on the other hand, maintaining the audio quality as good as possible. Over the last years, MPEG audio compression has proved to be one of the most successful solutions in this field.
Meanwhile, different versions of MPEG audio compression methods exist. Generally, the audio signal is sampled with a certain sample rate, the resulting sequence of audio samples being associated to overlapping time periods or time marks, respectively. These time marks are then individually supplied to, for example, a hybrid filter bank consisting of polyphase and a modified discrete cosine transform (MDCT), suppressing aliasing effects. The actual data compression takes place during quantization of the MDCT coefficients. The MDCT coefficients quantized in that way are then converted into a Huffman code of Huffman code words generating a further compression by associating shorter code words to more frequently occurring coefficients. Thus, overall, the MPEG compressions are lossy, the “audible” losses, however, being limited, since psychoacoustic knowledge has been incorporated in the way of quantizing the DCT coefficients.
A widely used MPEG standard is the so-called MP3 standard, as described in ISO/IEC 11172-3 and 13818-3. This standard allows an adaptation of the information loss generated by compression to the bit rate by which the audio information is to be transmitted in real time. The transmission of the compressed data signal in a channel with constant bit rate should also be performed in other MPEG standards. In order to ensure that the listening quality at the receiving decoder remains sufficient, even at low bit rates, the MP3 standard provides for an MP3 coder having a so-called bit reservoir. This means the following. Normally, due to the fixed bit rate, the MP3 coder should code every time mark into a block of code words having the same size, this block could then be transmitted with given bit rate in the time period of the time period repetition rate. However, this would not accommodate the case that some parts of an audio signal, such as the sounds following a very loud sound in a piece of music, require less exact quantization with constant quality compared to other parts of the audio signal, such as parts with a plurality of different instruments. Thus, an MP3 coder does not generate a simple bit stream format where every time mark is coded in one frame with the same frame length for all frames. Such a self-contained frame would consist of a frame header, side information and main data associated to the time mark associated to the frame, namely the coded MDCT coefficients, wherein the side information is information for the decoder how the DCT coefficients are to be decoded, such as how many subsequent DCT coefficients are 0, for indicating which DCT coefficients are successively included in the main data. Rather, a backpointer is included in the side information or in the header, pointing to a position within the main data in one of the previous frames. This position is the beginning of the main data pertaining to the time mark to which the frame is associated wherein the corresponding backpointer is included. The backpointer indicates, for example, the number of bites by which the beginning of the main data is offset in the bit stream. The end of these main data can be in any frame, depending on how high the compression rate for this time mark is. The length of the main data of the individual time marks is thus no longer constant. Thus, the number of bits by which a block is coded can be adapted to the properties of the signal. At the same time, a constant bit rate can be achieved. This technique is called “bit reservoir”. Generally, the bit reservoir is a buffer of bits, which can be used to provide more bits for coding a block of time samples than would generally be allowed by the constant output data rate. The technique of bit reservoir accommodates the fact that some blocks of audio samples can be coded with less bits than specified by the constant transmission rate, so that these blocks fill the bit reservoir, while other blocks of audio samples have psychoacoustic properties that do not allow such a high compression, so that the available bits would actually not be sufficient for low-interference or interference-free decoding, respectively, of these blocks. The required excessive bits are taken from the bit reservoir, so that the bit reservoir empties during such blocks. The technique of the bit reservoir is also described in the above-indicated standard MPEG layer 3.
Although the MP3 format does have advantages on the coder side by providing the backpointers, there are undeniable disadvantages on the decoder side. If, for example, a decoder receives an MP3 bit stream not from the beginning but starting from a certain frame in the middle, the coded audio signal at the time mark associated to this frame can only be played instantly when the backpointer is incidentally 0, which would indicate that the beginning of the main data to this frame is incidentally immediately after the header or side information, respectively. However, this is normally not the case. Thus, playing the audio signal at this time mark is not possible when the backpointer of the frame that was received first points to a previous frame, which, however, has not (yet) been received. In that case, (at first) only the next frame can be played.
Further problems occur on the receiver side when dealing with the frames in general, which are interconnected by the backpointers and are thus not self-contained. A further problem of bit streams with return addresses for a bit reservoir is that, when different channels of an audio signal are individually MP3 coded, main data pertaining to each other in the two bit streams since they are associated to the same time mark, might be offset to each other, and with variable offset across the sequence of frames, so that here again combining these individual MP3 streams into a multi-channel audio data stream is impeded.
Additionally, there is a need for a simple possibility for generating easily manageable MP3-compliant multi-channel audio data streams. Multi-channel MP3 audio data streams according to ISO/IEC standard 13818-3 require matrix operations for retrieving the input channels from the transmitted channels on the decoder side and the usage of several backpointers and are thus complicated to manipulate.
MPEG 1/2 1/2 layer 2 audio data streams correspond to the MP3 audio data streams in their composition of subsequent frames and in the structure and arrangement of the frames, namely the structure of header, side information and main data part, and the arrangement with a quasi statical frame distance depending on the sample rate and the bit rate variable from frame to frame, however, they differ from the same by the lack of backpointers or bit reservoir, respectively, during coding. Coding-expensive and inexpensive time periods of the audio signal are coded with the same frame length. The main data pertaining to a time mark are in the respective frame together with the respective header.
US 2003/009246 A1 describes a trick playing and/or editing apparatus, which allows to edit MP3 data streams in a simpler way. After reading-in an MP3 file into a MP3 provider, it is proposed to convert the file in a converter such that an intermediate MP3 stream results, wherein the frame data to a frame each immediately follow the respective determination block, so that the back pointers are 0. During conversion, first, for a certain frame, the corresponding determination block is read out from the original MP3 file stream, and in the same the bitrate is set to a maximum possible value or a minimum possible value by considering the resulting frame length in the intermediate MP3 stream. Further, the padding bit is set or not set, depending on how it is required in the resulting intermediate MP3 stream with self-contained frames. Other fields in the frame headers are not altered. Obviously, the back pointer value is set to 0. Then, the frame data for the respective current frame are read out from the MP3 original data stream and added to the newly generated determination block, and then fill information are added to the frame payload data to set the length of the resulting self-contained frame to the one determined by the altered bitrate. The resulting intermediate MP3 data stream is then supplied to a trick playing and/or editing unit that can perform simple manipulations on the same, since the frames are now self contained. The intermediate PM3 data stream altered in that way is passed on to a common MP3 decoder.
In Finlayson R. “A more loss tolerant RTP payload format for MP3 audio”, June 2001, URL: http.//www.faqs.org/rfcs/rfc3119.html”, a conversion of an MP3 data stream into a real-time protocol payload data format, short RTP format, is described, which is better suited in the case of packet loss. Within this conversion, the MP3 frames become MP3 application data units, short ADU frames. An ADU descriptor precedes every ADU frame. An ADU frame differs from the original MP3 frame in that the full sequence of coded audio files and any other random data for the ADU, i.e. those beginning in the original MP3 data stream at the position to which the back pointer points, which is included in the corresponding original MP3 frame header, and ending at the next position to which the back pointer in the next MP3 frame points, which are included in the same ADU frame. Otherwise, the ADU frames self-contained in such a way differ from the original MP3 frames merely in the optional replacement of the first 11 synchronization bits in the MP3 frame header by a connectivity sequence number provided to selectively enable to re-sort the sequence of ADU frames for transmission in deviation from the original time sequence. The ADU descriptors added to the ADU frames formed in that way contain three fields, namely a continuation flag, a descriptor type flag and an ADU size indication indicating the size of the ADU frame following the respective ADU descriptor. These pairs of ADU frame and ADU descriptor are packed into RTP packets having RTP headers. If such a pair of ADU frame and ADU descriptor does not fit into such a packet, it is distributed among two subsequent RTP packets. In that case, the continuation flag is set in the ADU descriptor of the following ADU frame. The descriptor type flag only indicates how many bits the ADU size indication in the ADU descriptor includes. The RTP header fields comprise, among others, a time mark indication indicating the replay time of the first ADU packed into the respective packet. This RTP packet data stream with possibly interleaved ADU frame could then again be easily converted into a common MP3 data stream, namely the original MP3 data stream.