The invention relates to audio conferencing, and in particular, to audio conferencing including encoding conferee audio with positional data relative to a listening position and mixing the encoded conferee audio streams for transmission to other conferees.
It is a problem in the field of audio conferencing to prevent mistaking the identity of a conferee that is speaking while also providing a method for mixing the audio stream received from two or more conferees and transmitting the mixed audio stream back to each conferee.
In an analog network conference calls are established by merely adding individual signals together using a conference bridge. If two or more people talk at once, their speech is superposed. Furthermore, an active talker can hear if another conferee begins talking. Naturally, the same technique is used in an early digital switch where the signals are first converted to analog, added, and then converted back to digital.
The process of combining multiple analog signals to form a conference call or function as multiple extensions on a single line can be accomplished by merely bridging the wired pairs together to superimpose the signals. When digitized voice signals are combined to form a conference the signals must be converted to analog so they can be combined on two-wire analog bridges or the digital signals must be routed to a digital conference bridge. The digital conference bridge selectively adds the signals together using digital signal processing and routes separate sums back to the conferees. When a conference includes a larger number of conferees the voices are summed together, making it difficult to distinguish whom is talking unless each conferee knows every other conferee well enough to distinguish between their voices.
A known method of resolving the problem requires active participation of the conferees. One such method requires conferees to introduce themselves at the beginning of the conference call. Each of the other conferees listen to the introductions and are required to remember the individual voices in order to later distinguish between conferees during the conference. This method fails to provide a method for distinguishing between conferees that have similar sounding voices. Another method requiring active participation requires the conferee to state his name before speaking. Even when each conferee remembers to state his or her name prior to speaking, it fails to provide a method for distinguishing between conferees that have the same name. The problems associated with active participation are compounded when the number of conferees to the conference increases.
A telephone conferencing arrangement apparatus is disclosed in Celli, (U.S. Pat. No. 5,020,098) wherein the transmitter and receiver sections of a telephone employ circuitry for an audio signal and a phase signal. Digitized phase data and digitized audio output are multiplexed to produce a single 64 kb/s data stream. At the receiver, a de-multiplexer separates the audio output from the phase data and the audio and the phase data are converted to analog signals. The receiver includes an audio panning amplifier that feeds two audio speakers, such as a left speaker and a right speaker. The phase signal provides the control voltage for the panning amplifier such that the phase signal determines that amount of signal proportionally flowing to the left and the right speaker. Thus, providing a positional representation of each conferee.
While the telephone conferencing arrangement apparatus disclosed in Celli overcomes the problems associated with requiring active participation from the conferees, it produces a phase signal relative to the conferees position with respect to the telephone they are using. A problem arises when more than one conferee is located at the same position relative to their telephone as another conferee. Both will produce the same phase signal, requiring the other conferees to again recognize the voice to distinguish between the two conferees. Another problem arises when one or more conferees change their position relative to the telephone they are using during the conference or when a speaker changes position while speaking. In this scenario, the proportion of the audio signal flowing to the left and the right speaker changes during the conference or while they the participant is speaking.
The methods of distinguishing conferees just described fail to provide a method or apparatus to distinguish conferees without requiring active conferee participation. One method requires conferees to introduce themselves one or more times during the conference while the telephone conferencing arrangement apparatus requires the conferees to remain in one position throughput the duration of the conference.
For these reasons, a need exists for a method of distinguishing between conferees without requiring active participation from the conferees.
The present audio conferencing with three-dimensional audio encoding overcomes the problems outlined above and advances the art by providing a method for assigning a distinct conference position to each conferee and then using the distinct position to encode the audio stream from the corresponding conferee for use with equipment that is capable of reproducing a three-dimensional or a stereo audio stream.
As each conferee is connected to the conference, the conferee is assigned a listening position relative to other conferees in a first audio image. Then the conferee is assigned a three-dimensional position with respect to each of the another conferee as the listener in another audio image. The number of audio images required is equal to the number of conferees. Each audio image having a different one of the conferees in the listening position with the remaining conferees assigned three-dimensional positions around the listener.
An audio mixer produces an audio stream that is different for each conferee, using the three-dimensional position assigned for each audio image. For a conference having three conferees, three audio images are assigned. The first conferee is the listener in the first audio image and the second and third conferees are assigned three-dimensional positions relative to the first conferee as listener. The second audio image has the second conferee as listener and the first and the third conferees assigned three-dimensional positions relative to the second conferee as listener. The third audio image is likewise configured with the third conferee as listener.
During the conference, three mixed audio streams are generated following the audio images. The first mixed audio stream includes audio from the second and third conferees each encoded with the three-dimensional position assigned in the first audio image. Likewise, mixed audio streams are generated for the second conferee by mixing encoded audio from the first and the third conferee, and so on.
The mixed audio streams that are generated each include one of the conferees in a listening position. In other words, all conferees will listen as though they were located within the center of the conference with the other conferees located in positions around the center. Each conferee receives a mixed audio stream comprising a mix of encoded audio streams from the other conferees and each conferee listens to the corresponding mixed audio stream relative to the a listening position.