In teleconferencing, audio from two or more different sources is reproduced in at least a third location, preferably with each of three or more locations being able to reproduce audio from the others. Teleconferencing involving four or more participants is also known, although many previous systems had a relatively low limit on the number of participants owning to limited bandwidth of the transmission medium. Accordingly, it would be useful to provide a teleconferencing system in which the bandwidth of the medium is less restrictive on the number of participants than many previous systems.
In a number of previous systems, the only indications of which participant or participants were speaking (or otherwise providing audio information) at a given time was information inherent in the audio signal itself, such as a recognizable tone of voice or the like. In particular, many previous systems summed the audio input from various participants into a single audio signal for monaural reproduction, such that spatialization information was not provided for helping to distinguish participants. Accordingly, it would be useful to provide a system to enhance the ability to recognize participants, such as providing location or spatialization information in reproducing audio signals in a teleconference, especially where this can be achieved with little or no impact on the number of participants permitted and/or bandwidth required.
Some previous systems which have attempted to provide stereophonic panning (but, typically, not three-dimensional spatialization) in the reproduction of remote audio signals have required installation of special equipment such as phase analyzers to achieve this goal. Some such systems require transmission, across the transmission medium, of information indicating relative position, at a single source, of audio signals, thus decreasing the bandwidth available for the audio signal itself compared to the bandwidth used for normal (non-stereo-panning) transmissions. Accordingly, it would be useful to provide a system for teleconferencing with the ability to provide spatial indications but without requiring installation of special hardware and without diminishing the amount of bandwidth otherwise available, or the transmission medium, for audio signals.
A number of audio transmission protocols currently in use couple audio information with information indicative of the identity of the source. One example is packet-switched audio protocols, in which each packet, in addition to containing a certain amount of audio information (typically digitized), also includes information (typically digital in form) indicative of the source (and typically, the destination) of the signal. This information regarding source is used for a number of purposes such as permitting concatenation of several packets from the same source to permit substantially continuous reproduction of a packetized audio signal. However, because such source information was not previously used for providing location cues during audio reproduction, previous systems made only a single use of such location information. Accordingly, it would be useful to provide a system in which location information can be used for more than one purpose (such as both concatenating packets and spatializing audio reproduction) thus effectively avoiding reduction in bandwidth when achieving such spatialization, since use would be made of data already being provided for another purpose.
In certain previous systems, stereo panning or other identification cues were provided in a fashion that was predetermined or otherwise out of the control of the receiving station or listener (such as being determined by the actual physical location of audio sources with respect to one another). Accordingly, it would be useful to provide a system in which audio location cues or other identification cues could be established at the site of the sound reproduction, such as automatically by the reproduction equipment or in a fashion selectable or adjustable by a listener, preferably arbitrarily in any desired or convenient three-dimensional configuration, and preferably independently of the actual, physical relative location of the audio sources.