The present invention relates generally to multichannel audio stream compression—i.e. including a plurality of audio signals—intended to be processed by an audio system including a plurality of loudspeakers in order to reproduce a spatialized sound scene. In particular, the compression means are applied to the audio streams encoded according to a multichannel coding format of the 5.1, 6.1, 7.1, 10.2, 22.2 type, or also according to an ambisonic coding format commonly known as “HOA” for “Higher-Order Ambisonics”. The HOA ambisonic encoding format is in particular detailed in the document Daniel, J., Acoustic Field Representation, Application to the Transmission and the Reproduction of Complex Sound Environments in a Multimedia Context, 2000, PhD Thesis, University of Paris 6, Paris. The compression applied to the audio streams can in particular be introduced prior to a step of transmission, broadcast or storage, for example on an optical disk.
In order to reduce the quantity of information required to represent a multichannel audio stream, it is possible to encode separately the different signals constituting said stream according to a conventional audio stream compression scheme, generally exploiting the frequency masking properties observed in the perception of a sound signal by a listener. Reference may be made by way of example to “MPEG-1/2 Audio Layer 3” coding, more generally denoted by its acronym MP3, or also “Advanced Audio Coding” or “AAC”. As the signals are considered separately, any redundancies between the signals are not exploited to any great extent. This solution is adapted to high bit-rate multichannel audio stream encoding, typically at a bit rate greater than or equal to 128 kbit/s per channel in the case of MP3, 64 kbits/s per channel in the case of AAC. Thus, separate encoding of the signals of a stream is not adapted to the production of streams typically having a bit rate of the order of 64 kbits/s for 5 to 7 channels, without significant reduction in the sound quality level.
Another possible alternative consists of mixing the different streams in order to obtain a mono or stereo signal. This technique is used in particular in low bit-rate “MPEG Surround” encoding i.e. in which the bit rate is typically of the order of 64 kbits/s for 5 to 7 channels. This operation is conventionally known as “downmix” The mono or stereo signal can then be coded according to a conventional compression scheme in order to obtain a compressed stream. Spatial information is moreover calculated then added to the compressed stream. This spatial information is for example the time difference between two channels (“ICTD” for “Inter-Channel Time Difference”), the energy difference between two channels (“ICLD” for “Inter-Channel Level Difference”), the correlation between two channels (“ICC” for “Inter-Channel Coherence”).
Coding the mono or stereo signal originating from the “downmix” operation is carried out based on an unsuitable hypothesis of monophonic or stereophonic perception and thus does not take account of the characteristics specific to spatial perception of the multi-channel signal, in particular in the case where the audio stream includes a significant number of channels, typically greater than or equal to 7.
Thus, the inaudible degradation on the signal originating from the “downmix” operation can become audible on a multi-loudspeaker restoration device of the multi-channel stream resulting from the “upmix” processing, in particular on account of the binaural unmasking, described in particular in the document Saberi, K., Dostal, L., Sadralodabai, T., and Bull, V., “Free-field release from masking,” Journal of the Acoustical Society of America, vol. 90, 1991, pp. 1355-1370.
A need therefore exists for more efficient compression of spatialized audio streams while retaining a perceived sound quality at least equivalent to the techniques of the state of the art.