In modern, filterbank-based audio encoding methods, such as MPEG Layer-3 or MPEG AAC, a psychoacoustic model is used in the encoder. In this psychoacoustic model, the total spectrum of the audio signal transformed into the frequency range is divided into individual frequency groups of varying widths and/or varying numbers of frequency lines per frequency group. For the calculation of the psychoacoustic listening thresholds, for the decision if center/side stereo encoding should be used, and for the evaluation and/or calculation of the scale factors in the quantization module of the audio encoder, the signal energies of the audio signal portions in the individual frequency groups are calculated in the psychoacoustic model. This is effected by squaring each individual frequency line, resulting in the line energies, and subsequent summation of all line energies in a frequency group to form the band energy of a frequency group, of which there may be about 40 to 60 per audio channel in the case of for example MPEG AAC.
In the following, the special application of an implementation of such a method in a fixed-point processor shall be considered.
In fixed-point representation, the frequency lines are represented with a word width of for example 24 bits or 32 bits. A word width of only 16 bits is not sufficient. A global scaling factor or a shifting factor is used, which applies to all frequency lines of an audio channel and also to all frequency lines of all audio channels processed in an encoder, and which determines by how many positions each frequency line value must be shifted to the left or the right so that the original scaling may again be obtained, which is referred to as block-floating-point representation. This is to say that all frequency lines of at least one audio channel are equally gated out and/or are on the same scaling level. In consideration of the generally high dynamics of the individual amplitudes of the frequency lines, this has some of the frequency lines represented with relatively high accuracy, such as for example 22 valid bits with a word width of 24 bits or 30 valid bits with a word width of 32 bits, whereas others are represented with only few valid bits and therefore rather inaccurately.
In the implementation of the above-mentioned filterbank-based audio encoding method in a fixed-point processor, problems as described below will arise.
Firstly, the problems concern the representation of the signal energies themselves. By means of the squaring, the signal energies, that is the summed up squares of the frequency lines, have obtained dynamics double those of the frequency lines if the total accuracy is to be maintained.
There are indeed various possibilities to represent the signal energies. One possibility is the representation of the signal energies by a data type having a word width double that of the data type used for the representation of the frequency lines, that is for example a data type with a width of 48 bits or 64 bits. Imagine, for example, a frequency line with 22 valid bits represented by a 24-bit data type. Based on the summation, the signal energy would, together with the other frequency lines, have at least 44 valid bits and would be represented in a 48-bit data type. This procedure, however, is not feasible at least for 64-bit energies, that is in cases, in which the frequency lines are represented in a 32-bit data type, as most conventional fixed-point processors either do not support a 64-bit data type at all or else memory access operations and calculations using a 64-bit data type are extremely slow compared to, for example, 32-bit access operations and calculations. In addition, memory consumption is significantly higher in the case of 64-bit data.
Another possibility of the representation of the signal energies is the representation by a floating-point data type by means of mantissa and exponent. Again assume the above-mentioned exemplary case that a frequency line with 22 valid bits it represented in a 24-bit data type. The signal energy of the respective group would then be represented in a standardized or proprietary floating-point data type with 16-bit mantissa including sign bit and 8-bit exponent. Here, it is immaterial if a standardized floating-point data type, such as IEEE-P754, or a proprietary floating-point data type with arbitrarily chosen mantissa and exponent widths, is concerned. On a fixed-point processor without a floating-point calculating unit, calculations with floating-point data types will have to be emulated by several calculation steps and will therefore be extremely slow, so that this procedure is not feasible.
A further problematic field in the implementation of the above-mentioned filterbank-based audio encoding method in a fixed-point processor is the reprocessing of the signal energies in the course of the encoding method. The signal energies and the listening thresholds derived therefrom are used in the further routine of the audio encoder at numerous different places in the algorithm, for example in order to calculate ratios or quotients, for example between signal energy and listening threshold. The necessitated division is not easy to perform on a fixed-point processor.
One possibility of performing a division on a fixed-point processor is the use of single-bit division commands, which are implemented in some fixed-point processors and supply an additional bit of accuracy in the quotient per call. For a division with an accuracy of for example 48 bits, 48 individual division commands for one single division of two signal energies or a signal energy and a listening threshold would therefore be necessitated. This is not feasible and very inefficient because of the high calculating time expenditure involved.
Another possibility of implementing a division in a fixed-point processor is the use of tables, possibly in connection with subsequent iteration steps for increasing the accuracy of the division results. This procedure, however, is often not feasible as for the necessitated accuracy of the division result, either a very large table must be used or subsequent iteration steps will in turn have a high demand for calculating time.
Both methods mentioned may be used in a fixed-point processor in connection with fixed-point data types or floating-point data types emulated per software, which, however, in none of the cases provides for a sufficiently efficient application with respect to calculating time and memory consumption with simultaneous result accuracy.
The above-mentioned problems would not occur if a GPP (General Purpose Processor) were used. For many applications, however, the use of processors having a higher performance than fixed-point processors without a floating-point calculating unit is automatically out of the question because of the high pricing pressure and the high number of pieces. Examples of such applications are mobile phones and PDAs.
U.S. Pat. No. 6,754,618 B1 responds to the problem of the SMR calculation, that is the calculation of the ratio between signal energy and listening threshold, and does so against a backdrop of the use of fixed-point DSP chips. In accordance with the procedure proposed therein, first the usual windowing and subsequent Fourier transformation for the decomposition of an audio signal into its spectral constituents is performed, following which the energy of each input signal and/or frequency line signal, that is, the line energy, is calculated from the real and imaginary portions of the respective frequency line value. Without going further into the creation of signal energies of the groups of frequency lines, the method could also be continued based on the signal energies of these groups. The content of this document is the attempt to remove the problem that the input data, that is the energies, mostly have dynamic range that is too large, as most fixed-point DSP chips comprise a data width of only 16 to 24 bits, whereas the MPEG standard would necessitate a data width of 34 bits, that is a dynamic range of 101 dB. Therefore, the energies would first have to be scaled. In particular, a proposition is made to digress from the former procedure and use two different scaling values. More precisely, in accordance with this document, the energy is compared to a threshold and scaled upward or downward, respectively, in order to be able to represent the logarithm result with sufficient 16 bits at a transition into a logarithmic range, and in order to be able to calculate the SMR ratio in the logarithmic range with 16 bits. Depending on whether an upward or a downward scaling is performed, a different table for the thresholds is used. For taking the logarithm, a common logarithm times 10 is used, so that the unit dB will be obtained. If the result of taking the logarithm of the upwardly scaled line energies is zero, the SMR ratio will be calculated by taking the logarithm of the upwardly scaled line energy minus the logarithm of the threshold energy times 10. Otherwise, the results of the upwardly scaled line energy and of the downwardly scaled line energy are combined with each other.
The procedure proposed in U.S. Pat. No. 6,754,618 B1 avoids some of the above-mentioned problems with respect to the reprocessing of the signal energies by proposing to calculate the SMR ratio in the logarithmic range. This removes the complex division calculation. This procedure, however, is disadvantageous in that the logarithm calculation is still relatively complex as the value range for a 16-bit fixed-point representation as suitable for 16-bit DSP fixed-point processors, is laid out only after the logarithm calculation, while the taking of the logarithm as such is still performed on the energies present with high dynamics, which results in the necessity of as much as two takings of the logarithm per energy value.
It is therefore desirable to simplify the transition into the logarithmic range also, without there occurring a loss in dynamics.
U.S. Pat. No. 5,608,663 deals with the fast execution of parallel multiplications of floating-point numbers by means of conversion into a logarithmic fixed-point format, addition in the logarithmic range and subsequent back conversion.
U.S. Pat. No. 5,197,024 generally deals with an exponential/logarithm calculation and a respective apparatus.
U.S. Pat. No. 6,732,071 deals with an efficient solution for a rate control in audio encoding, and for the determination of the quantization parameter value uses a loop iteration with a completion condition, according to which the quantization parameter value is compared to a term derived from a logarithm dualis of a term depending on a maximum frequency line value.
U.S. Pat. No. 6,351,730 describes the use of a logarithm dualis for a gain calculation within audio encoding. The gain values are used for the bit allocation in an MDCT-encoded audio codec.
U.S. Pat. No. 5,764,698 describes the use of a natural logarithm for the representation of audio signal energies. A more detailed description of the transition into the logarithmic range is not given.