The invention is directed to a process and a device for generating a bit rate scalable audio data stream. The invention is applicable in the field of data communications, particularly in the field of audio data communications.
A known problem in the field of data communications consists in that the data signal to be transferred is present in a data source with a high data rate, for example, at 64 kbit per second, but the data channel available for transfer or processing can only transfer the data to be transferred at a lower data rate, for example, at 32 kbit per second. In this case, the data must first be decoded at the higher data rate and then coded again at the lower data rate. This brings about a high expenditure on hardware technology and computing, especially because the data rate in modern data networks is not constant but variable and is adapted to the particular load situation of the data network. Compared to this, it would be more beneficial if a bit rate scalable data stream were to be supplied, of which only a part of the available data bits is transferred depending on the data rate offered by the transfer channel. Corresponding processes for generating bit rate scalable audio data streams are being undertaken across the world at the present time, in particular within the framework of efforts made towards standardization, for example, within the framework of the MPEG 4 (Moving Picture Expert Group) standardization. In particular, the CODECs (COder/DECoder) developed within the framework of the MPEG4 standardization must guarantee the functionality of the bit rate scalability.
A process for the generation of a bit rate scalable audio data stream in which the audio data stream is compressed in a core codec accompanied by determination of parameters is known from WO 97 159 83A. The coding is enhanced in subsequent enhancement stages. The enhancement stages are controlled in dependence on the core parameters.
The invention is therefore based on the problem of providing a process and a device for generating a bit rate scalable audio data stream which can be used in a versatile manner, which ensure a good transfer quality even when the available transfer rate is low, and which achieve a high degree of flexibility with respect to the available transfer rate in an economical manner.
In accordance with the present invention a process for generating a bit ratio scalable audio data stream includes: compression of the audio data stream in a core codec accompanied by the determination of core parameters; and enhancement of the coding in at least one downstream enhancement stage, wherein the enhancement in the enhancement stage is controlled by the core parameters, wherein the audio data stream is frequency-transformed, wherein a synthesized audio signal produced by the core codec is likewise frequency-transformed, and wherein the frequency-transformed synthesized audio signal is combined with the frequency-transformed audio data stream.
For a process for generating a bit rate scalable audio data stream with the step of compression of the audio input data stream in a core codec accompanied by determination of core parameters and the step of enhancement of the coding in at least one downstream enhancement stage, the problem is solved in that the enhancement in the enhancement stage is controlled by means of the core parameters. For the process according to the invention, the core codec forms the core assembly and codes the arriving input data stream at a low bit rate of, for example, 2, 4 or 6 kbit per second. The core codec is followed by any number of improvement or enhancement stages, as they are called, which code at a data rate of 1, 2, 3 or 4 kbit per second depending on application. An advantage of this process consists in that omission of any one enhancement stage has no effect on the other parts of the bit stream. It is an essential condition that the provided transfer system guarantees at least the bit rate of the core codec. The core codec parameterizes the incoming audio signal and determines, for example, parameters like pitch, voiced/unvoiced sounds, or the volume of sound. A core codec according to ITU-T G.723.1 (ITU, International Telecommunication Union), for example, can be used. It is particularly advantageous in the process according to the invention that the core parameters determined by the core codec control the subsequent enhancement stages because a considerable increase in the efficiency of the enhancement stages is achieved in this way.
In a particular embodiment type of the invention, the process is characterized in that a vector coding is effected in the enhancement stage and in that the core parameters control the selection of code books. This is advantageous because the code books for vector coding used in periodic audio segments are different from those used in non-periodic audio segments. Also, the energy parameters of the core codec are used directly for the coding of the signal energy (volume of sound), which results in a considerable bit rate saving. The use of core parameters is possible because they have to be transferred to the receiver in any case.
In a particular embodiment form, the process is characterized by the steps of transforming the audio input data stream, transforming a synthesized audio signal generated by the core codec, and a combination of the transformed synthesized audio signal with the transformed audio data stream. In this way, the difference between the audio signal compressed by the core codec and the original signal is determined advantageously in an economical manner and with high precision. In the simplest case, the combination can be a subtraction; however, more complex operations such as adapting the core spectrum for better matching with the original spectrum can also be involved. In this latter particular embodiment of the invention, the combination parameters for the adaptation must be transferred to the receiver.
In a particular embodiment of the invention, the process is characterized in that the core codec divides the input signal into at least two subframes; in that the transformation is a frequency transformation running synchronous to the subframe of the core codec; in that, by means of the frequency transformation, a transformation is effected per subframe which generates a spectrum vector in each instance; in that each spectrum vector is divided into at least two partial vectors corresponding to two partial bands; and in that each enhancement stage enhances one of these partial bands. The utilized frequency transformation and the dividing into partial bands has the advantage that the process according to the invention not only enables a high efficiency for the bit rate scaling according to objective criteria, but also takes into consideration subjective criteria such as the acoustic and physiological boundary conditions of human hearing. A resource allocation unit determines which partial band is to be enhanced. As has already been mentioned, this determination can be effected using a psycho-acoustic model which determines which frequency bands are subjectively important, or by measurements of signal-to-noise ratios, for example.
In a particular embodiment type of the invention, the process is characterized in that, for each enhancement stage, one set of parameters and one address for the enhanced partial band are transferred. Since the allocation unit enhances the partial bands in the order of their importance, this embodiment of the invention has the advantage that the enhanced bits are stored in the bit stream in the order of their importance. Since every stage has been provided with an address, allocation can be effected in the receiver without problems and in a manner that is reliably correct. Scaling is now made possible without problems and without any additional expenditure merely by suppressing a corresponding number of enhancement stages, beginning with the last, least significant stage. It is further advantageous that this scaling can be effected at any point on the transfer line. An additional modification of the remaining bit stream is not necessary for this.
In a particular embodiment of the invention, the process is characterized by storing for every partial band as many enhancement stages, arranged one behind the other in a bit stream to be transferred, as are present for the respective partial band; storing additional information for the determination of the relative importance of the individual enhancement stages of the partial bands; and bringing together the bit stream and the additional information before transfer in a bit stream manipulation unit. In so doing, it is advantageous that the overhead generated is lower compared with the above-mentioned storage and transfer formatting due to the fact that addressing of the separate enhancement stages is no longer necessary. This advantage is particularly relevant for database access.
In another particular embodiment of the invention, the process is characterized by the generation of partial bands on the receiver side by calculation from neighboring partial bands that have been received, in particular by interpolation. Other mathematical methods aside from interpolation can be taken into consideration, for example, statistical methods under consideration of the characteristics of the transferred data stream. The advantage therein consists in that partial bands which do not reach the receiver in time, or at all, because of faulty transfer or interrupted transfer can be reconstructed, or that partial bands can even be calculated in advance when individual partial bands or data packets are delayed along the transfer route.
The teaching of the invention also includes a device for generating a bit rate scalable audio data stream with a core codec which compresses the audio input data stream while also determining core parameters, and at least one enhancement stage downstream of the core codec, characterized in that the core codec is connected with the enhancement stage and the core parameters control the enhancement stage. Such a device is covered by the teaching of the present invention particularly when it carries out one of the processes described above, in particular when it carries out a process in which the device has an allocation control which controls which partial band is enhanced. The advantages indicated above within the framework of the description of the process according to the invention apply in a corresponding manner to the device according to the invention.
The teaching of the invention also covers the use of core parameters of a core codec for controlling the enhancement of the coding in an enhancement stage. This has the particular advantage that the core parameters which have been determined and transferred by the core codec and which represent the parameterization of the input signal can be used effectively for controlling the enhancement stages and, in particular, distinctly improve the subjective transfer quality in particular from a first enhancement stage on.
The teaching of the invention also covers a data carrier on which control information has been stored, characterized in that the control information controls the flow of one of the processes described above in an electronic computing device or in one of the devices described above. The data carrier can store the control information in any form, in particular in mechanical, optical, magnetic or electronic form. It is particularly advantageous that the control information stored in this manner is portable, can be implemented easily, is inexpensively reproducible and can be maintained with little expenditure. Implementing the control information in an electronic computing installation is possible by using prior art measures.
Further advantages, features and details of the invention are indicated in the subclaims and in the subsequent description in which a plurality of embodiment examples are described in detail with reference to the drawings. In this case, the features mentioned in the claims and in the description can be essential for the invention by themselves or in any combination. A mode for carrying out the claimed invention is subsequently discussed in detail with reference to the drawings.