The present invention relates to a method and a device for reducing multi-channel acoustic echo and restoring auditory perspective or xe2x80x9cadapting sound to spacexe2x80x9d. It finds a particularly important application in systems for digitally transmitting sound signals on several transmission channels between a local site and one or more remote sites, each site having several microphones and several loudspeakers. Such systems are referred to as xe2x80x9cmulti-channelxe2x80x9d systems with regard to the sound take (the sound is picked up by several microphones), transmission (on several channels) and sound restoration (by means of several loudspeakers).
By way of example, although this is not restrictive, the invention will be described in an application involving video conference rooms, both on the basis of transmission between two rooms, in what is referred to as a xe2x80x9cpoint to pointxe2x80x9d configuration (where the image from the remote room is displayed on a screen or several viewpoints of the remote room are displayed on several screens) and with regard to transmission within a network comprising more than two rooms in what is referred to as a xe2x80x9cmulti-pointxe2x80x9d configuration (where the images of the remote rooms are displayed on several screens).
In a room having N microphones and M loudspeakers, there are Nxc3x97M acoustic echo paths. On the one hand, conventional systems for eliminating acoustic echo do not allow echo to be controlled at a reasonable cost in such a context. On the other, connecting each microphone in a local room to a loudspeaker of a remote room for transmission purposes in order to obtain the best possible distribution of the sound would multiply the number of transmission channels, making the transmission cost prohibitive in terms of commercial development.
Apart from the conventional systems for eliminating acoustic echo, systems for reducing acoustic echo are known, based on variations in the level of the sound signals. A major disadvantage generally encountered in such systems is the constraint placed on interactivity, i.e. a reduction in reception quality, in particular a considerable variation in sound level on reception in what are referred to as xe2x80x9cdouble speechxe2x80x9d situations, i.e. where, in a given local room, there is an effective local sound signal present simultaneously with an effective sound signal from the remote room.
The objective of the present invention is to provide a method and a device for reducing multi-channel acoustic echo and adapting sound to space, which will allow the echo to be reduced whilst maintaining interactivity and which, although using a relatively small number of transmission channels, will guarantee restoration of the auditory perspective. In its application to video-conferencing, the present invention will enable the remote meeting to be conducted as a natural communication situation.
To this end, the invention specifically proposes a method of reducing acoustic echo and adapting sound to space in a system for digitally transmitting sound signals on P transmission channels between a local site and at least one remote site, each having N microphones and M loudspeakers, N, M, P being integers, it being possible for the values of N and M to differ depending on the sites, whereby :
(a) a cumulative distribution function is computed for each microphone signal xi(n) from the local site, i being an integer ranging between 1 and N and n denoting the time rank of the samples, and a cumulative distribution function is computed for each loudspeaker signal zj(n) from the local site, j being an integer ranging between 1 and M (the concept of cumulative distribution function will be defined below); then, for every i, 1xe2x89xa6ixe2x89xa6N,
(b) a first attenuation factor Gmic(i,n) is computed for the microphone signal xi(n) from the local site on the basis of a ratio between the cumulative distribution functions obtained previously;
(c) the first attenuation factor Gmic(i,n) is adjusted so as to obtain a second attenuation factor Gxe2x80x2mic(i,n) defined as follows:
Gxe2x80x2mic(i,n)=S1(Gmic(i,n))
where S1(Gmic(i,n))=s if Gmic(i,n)xe2x89xa6s,
S1(Gmic(i,n))=Gmic(i,n) if s less than Gmic(i,n) less than 1 and
S1(Gmic(i,n))=1 if Gmic(i,n)xe2x89xa71,
s being a predetermined minimum threshold which is strictly less than 1;
(d) on the basis of the cumulative distribution functions of microphone and loudspeaker signals computed previously, it is determined
whether the microphone signal xi(n) is an echo signal only or a signal coming solely from the local site in the case of a first situation or
if the microphone signal xi(n) contains components from the local site and other components from the remote site in the case of a second situation;
(e) a third attenuation factor Gxe2x80x3mic(i,n) is computed which, in the first situation, is equal to the second attenuation factor Gxe2x80x2mic(i,n) and in said second situation is equal to the second attenuation factor Gxe2x80x2mic(i,n) but in which the minimum threshold s in the computation used to obtain it is increased by a predetermined value;
(f) a fourth attenuation factor xcex93(i,n) is computed on the basis of a ratio between the cumulative distribution functions of microphone signals;
(g) the fourth attenuation factor xcex93(i,n) is adjusted in order to obtain a fifth attenuation factor xcex93xe2x80x2(i,n) defined as follows:
xcex93xe2x80x2(i,n)=S2(xcex93(i,n))
where S2(xcex93(i,n))=sxe2x80x2 if xcex93(i,n)xe2x89xa6sxe2x80x2,
S2(xcex93(i,n) )=xcex93(i,n) if sxe2x80x2 less than xcex93(i,n) less than 1 and
S2(xcex93(i,n))=1 if xcex93(i,n)xe2x89xa71,
sxe2x80x2 being a predetermined minimum threshold strictly less than 1;
(h) the product of the third and fifth attenuation factors Gxe2x80x3mic(i,n) and rxe2x80x2(i,n) obtained previously is computed so as to obtain a global attenuation factor G*mic(i,n) defined by:
G*mic(i,n)=Gxe2x80x3mic(i,n). xcex93(i,n)
(i) the global attenuation factor G*mic(i,n) is adjusted so as to obtain a weighting factor xcex2i(n) defined as follows:
xcex2i(n)=S4(G*mic(i,n))
where S4(G*mic(i,n))=sxe2x80x3 if G*mic(i,n)xe2x89xa6sxe2x80x3 and
S4(G*mic(i,n))=G*mic(i,n) if sxe2x80x3 less than G*mic(i,n)xe2x89xa61,
sxe2x80x3 being a predetermined minimum threshold strictly less than 1;
(j) a signal yk(n) is transmitted on each transmission channel, k being an integer between 1 and P, in the form of a linear combination of the weighted microphone signals xi(n), defined as follows:             y      k        ⁢          (      n      )        =            ∑              i        =        1            N        ⁢                            α                      k            ,            i                          ⁢                  (          n          )                    ·                        β          i                ⁢                  (          n          )                    ·                        x          i                ⁢                  (          n          )                    
where xcex1k.i(n) denotes the predetermined real coding coefficients and xcex2i(n) denotes the weighting factors obtained previously; then, for every integer j, 1xe2x89xa6jxe2x89xa6M:
(k) a sixth attenuation factor GHP(j,n) is computed for the loudspeaker signal zj(n) from the remote site on the basis of cumulative distribution functions calculated for each transmitted signal yk(n) from the local site;
(l) the sixth attenuation factor GHP(j,n) is adjusted so as to obtain a weighting factor xcexj(n) defined as follows:
xcexj(n)=S3(GHP(j,n))
where S3(GHP(j,f))=* if GHP(j,n)xe2x89xa6s*,
S3(GHP(j,n))=(GHP(j,n) if s* less than GHP(j,n) less than 1 and
S3(GHP(j,n))=1 if GHP(j,n)xe2x89xa71,
s* being a predetermined minimum threshold strictly less than 1;
(m) the loudspeaker signal zj(n) of the remote site is determined on the basis of a linear combination of the weighted transmitted signals yk(n), defined as follows:             z      j        ⁢          (      n      )        =                    λ        j            ⁢              (        n        )              ·                  ∑                  k          =          1                P            ⁢                                    γ                          j              .              k                                ⁢                      (            n            )                          ·                              y            k                    ⁢                      (            n            )                              
where xcexj.k(n) denotes the predetermined real decoding coefficients and where xcexj(n) denotes the weighting factors obtained previously; and
(n) the loudspeaker signal zj(n) thus obtained is emitted on the j-th loudspeaker of the remote site.
The operations outlined above are performed in a similar manner in all the rooms of the network in question, both on transmission and on reception. Throughout this text, in order to simplify the description, we will look at a given room which will be referred to as the xe2x80x9clocalxe2x80x9d room and one or several xe2x80x9cremotexe2x80x9d rooms and only the operations performed in the local room on transmission and the operations performed in the remote room(s) on reception will be described, although similar operations are also performed in the remote room(s) on transmission and in the local room on reception respectively.
For the same purpose described above, the present invention also proposes a device for reducing acoustic echo and adaptation to space in a system for digitally transmitting sound signals on P transmission channels between a local site and at least one remote site, each Ad having N microphones and M loudspeakers, N, M and P being integers, it being possible for the values of N and M to differ depending on the sites, characterised in that it comprises:
a module for encoding sound signals, receiving at its input N digital signals xi(n) respectively originating from the N microphones of the local site, where i is an integer between 1 and N and n is an integer which denotes the time rank of the samples, this encoding module supplying at its output P digital signals yk(n), where k is an integer ranging between 1 and P, defined by the following formula:             y      k        ⁢          (      n      )        =            ∑              i        =        1            N        ⁢                  α                  k          ,          i                    ·                        β          i                ⁢                  (          n          )                    ·                        x          i                ⁢                  (          n          )                    
where xcex1k.1(n) denotes the predetermined real coding coefficients and
where xcex2i(n) denotes the weighting factors which depend on cumulative distribution functions of the signals received by the microphones of the local site and cumulative distribution functions of the signals transmitted by the loudspeakers of the local site, the P signals yk(n) being transmitted respectively on the P transmission channels between the local site and the remote site; and
a module for decoding the sound signals, receiving at its input the P signals yk(n) and supplying M digital signals zj(n) at its output to be emitted respectively by the M loudspeakers of each site, where j is an integer between 1 and M, defined by the following formula:             z      j        ⁢          (      n      )        =                    λ        j            ⁢              (        n        )              ·                  ∑                  k          =          1                P            ⁢                                    γ                          j              ,              k                                ⁢                      (            n            )                          ·                              y            k                    ⁢                      (            n            )                              
where xcex3j.k(n) denotes the predetermined real decoding coefficients and where xcexj(n) denotes the weighting factors which depend on the signals yk(n).
Other features and advantages of the invention will become clear from the detailed description of specific embodiments which are given below by way of example but are not restrictive in any respect.