Audio teleconferencing is a widely used technology that allows multiple, remote individuals or groups to jointly engage in a conversation by telephone. Conventionally, teleconferencing was provided using a speakerphone that included both one or more speakers for outputting the voices of the remote participants and one or more microphones for receiving the voices of the local participants. In this configuration, there is no graphical user interface associated with the audio conferencing system. As a result, there are often difficulties in determining which remote speakers wish to speak next, and in the current speaker yielding to those individuals.
In face-to-face conversation, turn taking is based on largely non-verbal cues, including body positioning, eye contact, and physical indications between the speakers. Because there is no visual component to the audio conferencing, turn taking is typically based on waiting for a sufficient length of time after the other speaker has stopped talking to be sure that no one else is going to speak, and then speaking up. Or, a speaker simply has to interrupt another speaker in order to gain the floor. Which speakers feel able to interrupt others is heavily dependent on the organizational hierarchy and power relationships between the participants. Either of these approaches result in an inefficient and unnatural turn taking.
More recently, audio conferencing is now supported in conjunction with personal computers, which include a desktop audio conferencing client application, for example Apple Computer's Quicktime® conferencing client, and Microsoft Corp.'s NetMeeting® client. In these systems, the conferencing client is used to setup and establish an audio conference, encode and decode the audio data (e.g., using an H.323 codec), and transmit the data over a computer network. The user interface of these clients typically provide controls for dialing, muting, volume control, hanging up, looking up directory information, and establishing default preferences and parameters (e.g., local phone number, IP address, and so forth). However, during an actual audio conference, the user interface often provides little or no information that conveys the non-verbal social cues necessary for normal (e.g. face-to-face) turn taking behavior. In some audio conferencing a single “moderator” can control which individuals can speak at any given; but this type of imposed turn taking does not provide the same social dynamics as non-verbal cues present in natural conversations, which by and large are unmoderated.
Accordingly, there continues to be a need for audio conferencing applications that provide a user interface which supports non-verbal cues between a plurality of remote participants that allows for natural turn taking behavior between the participants as would be present in face-to-face dialogues.