The present invention relates in general to the analysis of signals that are coded in arbitrary manner and decoded again, and in particular to the analysis of a decoded signal that has been processed using a coding algorithm that is based on a spectral representation of the original signal.
It is generally known to code audio and/or video signals using a specific coding method in order to obtain a coded version of the original signal; the coded version of the original signal basically should differ from the original signal to the effect that the data quantity of the coded signal is smaller than the data quantity of the original signal. In this event, the coding algorithm for obtaining the coded signal from the original signal as well as the decoding algorithm, being in essence the inverted coding algorithm, are referred to as data-reducing coding algorithm.
For data reduction of audio signals, there are various coding algorithms that are subject matter of a number of international standards, such as e.g. MPEG-1, MPEG-2, MPEG-4 or also MPEG-2 AAC (AAC=Advanced Audio Coding), with the latter coding algorithm being described in detail, for example, in international standard ISO/IEC 13818-7.
In the following, reference will be made to FIG. 7 illustrating a block diagram of an MPEG audio coding method. Such an audio coder typically comprises an audio input 70 for inputting a stream of time-discrete sampling values which are, e.g. PCM sampling values having e.g. a width of 16 bits. In an analysis filter bank 71, the stream of time-discrete sampling values is divided into coding blocks or frames of sampling values using a corresponding window function, and is then converted to a spectral representation e.g. by a filter bank or by a Fourier transform or a modified Fourier transform, such as e.g. a modified discrete cosine transform (MDCT). At the output of the analysis filter bank 71, there are thus present consecutive coding blocks or frames of spectral coefficients, with a block of spectral coefficients being the spectrum of a coding block of audio sampling values. Often, a 50% overlap of consecutive coding blocks is employed so that, for each block, a window of e.g. 2048 audio sampling values is observed and 1024 new spectral coefficients are created by such processing.
The time-discrete audio signal at input 70, moreover, is fed into a psychoacoustic model 72 in order to obtain a data reduction, such that, as is known, the masking threshold of the audio signal is calculated as a function of the frequency in order to carry out, in a block 73, designated quantizing and coding, a quantization of the spectral coefficients that is dependent upon the masking threshold.
In other words, the quantization of the spectral coefficients is carried out coarsely such that the quantization noise introduced thereby is still below the psychoacoustic masking threshold calculated by the psychoacoustic model 72, so that this quantization noise is not audible in the ideal case. This procedure has the effect that typically a specific number of spectral coefficients, which are still unequal 0 at the output of the analysis filter bank 71, are set to 0 after quantization since the psychoacoustic model 72 has determined that these are masked by adjacent spectral coefficients and are therefore inaudible.
Also independently of a psychoacoustic or psychooptic model, each quantizer has a specific quantization step width, with spectral values smaller than the step width being set to zero by the quantization. Depending on the quantizer, there is also the possibility that just values that are clearly smaller than the step width are set to zero, whereas values slightly below the step width are rounded up. In most cases, each quantizer sets at least some values to zero, thereby already achieving a data reduction.
After quantization, there is provided a spectral representation of the coding block of time-discrete sampling values in which the quantization noise should possibly be below the psychoacoustic masking threshold. These spectral values that are quantized in data-reducing manner may then be coded, depending on the coder employed, in loss-free manner using entropy coding, which may be e.g. Huffman coding. Due to this, a stream of code words is obtained, to which is added, in a bit stream multiplexer 74, side information that is still required by a decoder, such as information concerning the analysis filter bank, information concerning the quantization, such as e.g. scale factors, or side information concerning additional functional blocks. In case of MPEG-2 AAC, such additional functional blocks are, for example, TNS processing, intensity stereo processing, mid/side stereo processing or a prediction from spectrum to spectrum.
At an output 75 of the coder, which is also referred to as bit stream output, the signal coded in accordance with the coding algorithm illustrated in FIG. 7 is then present in the form of blocks.
With respect to the decoder, the coded signal at the output 75 of the coder shown in FIG. 7 is fed to a bit stream input 80 of a decoder illustrated in FIG. 8 which first carries out a bit stream demultiplexing operation in a block 81, referred to as bit stream demultiplexer, in order to separate the spectral data from the side information. At the output of block 81, there are again available the code words representing the individual spectral coefficients. Using a corresponding table, the code words are decoded in order to obtain quantized spectral values. These quantized spectral values are then processed in a block 82 designated xe2x80x9cinverse quantizationxe2x80x9d in order to calculate back the quantization introduced in block 73 (FIG. 7). At the output of block 82, there are available once more dequantized spectral coefficients which are now transformed to the time domain by means of a synthesis filter bank 83 operating in inverse manner to the analysis filter bank 71 (FIG. 7), in order to obtain the decoded signal at an audio output 84.
When considering the coding/decoding concept illustrated in FIGS. 7 and 8, it becomes clear that a block-oriented method is involved here in which the block generation is effected by the analysis filter bank block 71 of FIG. 7 and in which the block formation is cancelled again only at the audio output 84 of the decoder illustrated in FIG. 8.
It becomes clear furthermore that a lossy coding concept is involved here since the decoded signal present at audio output 84 in general contains less information than the original signal present at audio input 70. By way of the quantizer 73 controlled by the psychoacoustic model 72, information is removed from the original signal present at audio input 70, with this information being not added any more in the decoder, but rather being dispensed with. Seen in purely subjective manner, this waiver of information in the ideal case has not led to quality impairments due to the psychoacoustic model 72 that is matched to the properties of the human ear, but has led merely to a desired data compression.
It is to be pointed out here that the coding concept described with reference to FIG. 7 and FIG. 8 by way of an audio signal is also applied correspondingly to image or video signals in which, instead of the temporal audio signal, a video signal is present and in which the spectral representation is not a spectrum of sound here, but a spectrum of place. As for the rest, video signal compression also involves an analysis filter bank, a psychooptic model, quantization and redundancy coding controlled thereby, with the entire coding/decoding concept taking place blockwise as well.
The decoded signal (in case of the example of FIG. 8, the decoded audio signal at audio output 84) typically is again a stream of time-discrete sampling values based on an underlying coding block raster which, however, is generally not visible in the decoded signal, unless specific precautions are taken.
While the process of decoding is the normal case in the application, namely the transfer and storage of audio and/or image signals, there are nevertheless cases in which it is of interest xe2x80x9cto re-translatexe2x80x9d a given decoded signal into a bit stream representation. This is of interest in particular in the following cases, if the decoded signal is available only.
Furthermore, it is often necessary to examine coding systems by way of the signals coded and decoded again by the same, for example, to find out why a coder that is not yet known has such a good sound.
In addition thereto, there is a demand in the field of copyright protection to furnish evidence without any doubt that a piece of music or an image was coded originally using a specific coder.
Finally, in the field of transmission, for example, over a plurality of networks of different bandwidth, there is the requirement of again coding a decoded signal in order to convert it to a different bandwidth, for example. In that event, the coder/decoder concept illustrated in FIG. 7 and FIG. 8 is applied to an original audio signal in succession several times. In this regard, there are problems to the effect that so-called tandem coding distortions of subsequent codec stages are introduced if the subsequent codec stations operate on the basis of a different coding block raster than the preceding codec stages. It is understandable that the use of a different coding block raster in a subsequent codec stage introduces audible distortions into the audio signal if the coding block formation was not carried out in exactly the same manner as in the first codec stage, since the concept is based on the formation of short-time spectrums and since in particular the psychoacoustic masking threshold of a coding block is dependent on time-discrete sampling values of the coding block raster.
The technical publication xe2x80x9cNMR Measurements on Multiple Generations Audio Codingxe2x80x9d, Michael Keyhl, Jxc3xcrgen Herre, Christian Schmidmer, 96th AES Convention, Feb. 26 to Mar. 1, 1994, Amsterdam, Preprint 3803, suggests to overcome tandem coding distortions by introducing an identification mark into a decoded signal, which may be accessed by subsequent coder stages in order to carry out, on the basis of this identification mark, their coding block partitioning of the decoded signal to be coded anew, such that all codec stages in a chain of codec stages make use of the same coding block raster.
Although this method has considerably reduced the tandem coding distortions, it is nevertheless disadvantageous to the effect that the identification mark must be introduced by a decoder and must be extracted again and interpreted by a subsequent coder. Thus, changes are necessary both in a decoder and in a coder. Furthermore, this concept of course is applicable to tandem coding only of such decoded signals that have this identification mark of the coding block raster. For signals that do not have this identification mark, a codec stage in a chain of codec stages of course cannot access an identification mark.
Similar problems or restrictions in flexibility result also in case of the MOLE concept described in xe2x80x9cISO/MPEG Layer 2xe2x80x94Optimum re-Encoding of Decoded Audio using a MOLE Signalxe2x80x9d, John Fletcher, 104th AES Convention, May 16 to 19, Preprint No. 4706. Generally speaking, there are introduced additional data into the decoded audio signal, which describe in detailed manner in what way the decoded audio signal concerned has been coded and decoded. These data are referred to as MOLE signal. If the decoded audio signal has to be coded again, a specifically designed coder will extract this MOLE signal from the signal to be coded and carry out the individual coding steps on the basis of this signal.
Similar to the concept of the identification mark, a disadvantage here also resides in that the decoder which decodes a coded original signal for the first time has to introduce the signal into the decoded audio signal. Such a decoder thus differs from the usual standard decoders. In addition thereto, a coder that again codes a decoded signal has to extract the determination signal in order to operate accordingly. This, so to speak, second coder also has to be modified such that it can read and interpret the determination signal. Finally, this concept too, unfortunately is effective only for decoded signals having such a determination signal, however not for signals having no such determinations signal.
Both the identification mark and the MOLE determination signal provide information as to which coding block raster is underlying the decoded signal having the identification mark or the MOLE determination signal associated therewith. However, these signals have to be introduced explicitly, thus entailing the flexibility disadvantages described hereinbefore.
It is the object of the present invention to provide a device and a method for determining a coding block raster, on which a decoded signal is based, for a decoded signal having no explicit hint towards a coding block raster.
In accordance with a first aspect of the present invention, this object is achieved by a device for determining a coding block raster on which a decoded signal is based, in which the decoded signal is produced from an original signal by coding and decoding according to a coding algorithm including a coding block generating step, a conversion step and a data reducing step, said coding block generating step of the coding algorithm including partitioning the original signal according to the coding block raster into coding blocks with a specific number of time-discrete signal values, said conversion step including generating from a coding block a spectral representation of the same, and said data reducing step including removing information from the spectral representation of the original signal, said device comprising: a picker for picking out a segment of the decoded signal, said segment beginning at an output sampling value of the decoded signal; a processor for performing the conversion step on said segment of the decoded signal so as to provide a spectral representation of said segment; an evaluator for evaluating the spectral representation of said segment with respect to a predetermined criterion in order to obtain an evaluation result for the segment, said device for determining a coding block raster being further arranged to pick out, convert and evaluate a plurality of segments of the decoded signal that begin at different output sampling values in order to obtain a plurality of evaluation results; and a searcher for searching the evaluation results and for outputting an identification for the coding block raster underlying the decoded signal, on the basis of the segment that has an extreme evaluation result with respect to other evaluation results.
In accordance with a second aspect of the present invention, this object is achieved by a method for determining a coding block raster on which a decoded signal is based, in which the decoded signal is produced from an original signal by coding and decoding according to a coding algorithm including a coding block generating step, a conversion step and a data reducing step, said coding block generating step of the coding algorithm including partitioning the original signal according to the coding block raster into coding blocks with a specific number of time-discrete signal values, said conversion step including generating from a coding block a spectral representation of the same, and said data reducing step including removing information from the spectral representation of the original signal, said method comprising: picking out a segment of the decoded signal, said segment beginning at an output sampling value of the decoded signal; performing the conversion step on said segment of the decoded signal so as to provide a spectral representation of said segment; evaluating the spectral representation of said segment with respect to a predetermined criterion in order to obtain an evaluation result for the segment, said steps of picking out, performing and evaluating being carried out a plurality of times in order to pick out, convert and evaluate a plurality of segments of the decoded signal that begin at different output sampling values in order to obtain a plurality of evaluation results; and searching the evaluation results and outputting an identification for the coding block raster underlying the decoded signal, on the basis of the segment that has an extreme evaluation result with respect to other evaluation results.
The present invention is based on the finding that the coding block raster, which is defined in virtually random fashion by a block-oriented coder, has a decisive influence on the spectral representation of the signal. Even minimum deviations or coding block raster offsets have the effect that the spectral representation of the decoded signal has a completely different appearance than would actually be expected of a spectral representation of the decoded signal when the same is based on the same coding block raster on which the decoded signal as such is based. In case of data-reducing coding algorithms operating on the basis of a psychoacoustic model or psychooptic model, it is known from the very beginning that, on the basis of quantization using a psychooptic or psychoacoustic masking threshold, a certain number of spectral coefficients is zero.
It is pointed out that also independently of a quantization controlled by a psychoacoustic or psychooptic model, there are usually specific values that are always set to zero, namely those values that are considerably smaller than the quantization step width.
If, however, the coding block raster partitioning for generating a spectral representation of the decoded signal is not in conformity with the coding block raster partitioning on which the decoded signal as such is based, this property does no longer appear in the spectral representation of the decoded signal. However, also with coding concepts that are not necessarily data-reducing or with concepts which, although they would be data-reducing, do not have a significant data reducing effect due to the input signal, a coding block raster offset already has the effect that the spectrum of the decoded signal that is based on a different coding block raster partitioning than the coding block raster partitioning on which the decoded signal is based. This results in a changed spectral structure having a highly xe2x80x9csmearedxe2x80x9d appearance, which in particular makes itself felt in that the individual spectral components can no longer be separated well from each other.
This characteristic of the spectrum can be utilized as a criterion for finding out whether a coding block raster offset is involved. In case of a spectrum with raster offset, the fluctuation of the e.g. logarithmic amplitude of the spectral coefficients is slower or less abrupt than in case of a spectrum without raster offset in which a rapid or very abrupt fluctuation of the amplitude of the spectral coefficients can be noted.
Generally speaking, a short-time spectrum of the decoded signal generated using a coding block raster partitioning corresponding to the coding block raster partitioning on which the decoded signal is based, has a specific appearance, for example with respect to the separation of the spectral lines, with respect to the number of spectral lines that are equal to zero or are very small, etc.
According to the invention, there is thus a segment of the decoded signal picked out for determining a coding block raster, whereupon the segment picked out is converted into a spectral representation thereof. Thereafter, the spectral representation of the segment picked out is examined with respect to at least one predetermined criterion in order to obtain an evaluation result for the segment. This concept is carried out for various segments, using each time a different coding block raster as basis, so that various evaluation results are obtained for different coding block raster partitionings and thus coding block raster offsets. A coding block raster offset that corresponds best to the predetermined criterion, i.e. that has an evaluation result that is extreme compared to the other evaluation results then will be ascertained among the evaluation results generated by evaluating the spectral representations of the various segments picket out, and will be output. The coding block raster partitioning on which a decoded signal is based thus can be reconstructed unequivocally without the use of an auxiliary signal explicitly contained in the decoded signal.
This concept basically permits to determine from each decoded signal the coding block raster underlying the same and thus provides considerable flexibility to the effect that all decoded signals can be processed, and not only decoded signals that already have an identification mark or a MOLE determination signal. It is thus possible to analyze almost any decoded signals in order to perform distortion-free tandem coding so as to obtain further information on the coding algorithm on which the decoded signal is based, or so as to furnish evidence at all as to which coder was originally used for coding the decoded signal.
Preferably, the coding block raster underlying the decoded signal, as determined according to the invention, can be introduced into the decoded signal proper in order to thus match arbitrary decoded signals for existing codec stages based on the identification mark or the MOLE determination signal.
In addition thereto, the concept according to the invention permits the determination of almost all coding parameters, all the more so as, on the basis of the knowledge of the coding block raster and using corresponding iteration algorithms, virtually all coder functionalities, so to speak, can be xe2x80x9ccalculated backxe2x80x9d. The prerequisite therefore is, however, the determination of the coding block raster as such, as the coding block raster influences all ensuing parameters of a coding algorithm that is based on a spectral representation of a signal to be coded. The determination of the coding block raster thus is, so to speak, the xe2x80x9centrance gatexe2x80x9d for completely analyzing a decoded signal with regard to the coding/decoding concept underlying the same.