Any discussion of the background art throughout the specification should in no way be considered as an admission that such art is widely known or forms part of common general knowledge in the field.
Video and audio teleconferencing systems where multiple parties interact remotely to carry out a conference are an important resource.
Many systems are known. Most rely on a central or distributed server resource to ensure each participant is able to hear and/or see the other participants using, for example, dedicated teleconferencing devices, standard computer resources with audio/input output facilities or Smart Phone type devices. The distributed server resource is responsible for appropriately mixing uplinked audio signals together from each conference participant and downlink the audio signals for playback by each audio output device.
By way of background, in a typical (known) teleconferencing system a mixer receives a respective ‘uplink stream’ from each of the telephone endpoints, which carries an audio signal captured by that telephone endpoint, and sends a respective ‘downlink stream’ to each of the telephone endpoints. Thus each telephone endpoint receives a downlink stream which is able to carry a mixture of the respective audio signals captured by the other telephone endpoints. Accordingly, when two or more participants in a telephone conference speak at the same time, the other participant(s) can hear both participants speaking.
It is known (and usually desirable) for the mixer to employ an adaptive approach whereby it changes the mixing in response to perceiving certain variations in one or more of the audio signals. For example, an audio signal may be omitted from the mixture in response to determining that it contains no speech (i.e. only background noise). But changing the mixing at the wrong time may lead to disconcerting artefacts being heard by the participants.