The present invention relates to conferencing systems and, more specifically, to a system for identifying a speaker in a multi-party conference.
Telephone conferencing systems provide multi-party conferences by sending the audio from the speaking participants in the conference to all of the participants in the conference. Traditional connection-based telephone systems set up a conference by establishing a connection to each participant. During the conference, the telephone system mixes the audio from each speaking participant in the conference and sends the mixed signal to all of the participants. Depending on the particular implementation, this mixing may involve selecting the audio from one participant who is speaking or it may involve combining the audio from all of the participants who may be speaking at the same moment in time. Many conventional telephone conferencing systems had relatively limited functionality and did not provide the participants with anything other than the mixed audio signal.
Telephone conferencing also may be provided using a packet-based telephony system. Packet-based systems transfer information between computers and other equipment using a data transmission format known as packetized data. The stream of data from a data source (e.g., a telephone) is divided into fixed length xe2x80x9cchunksxe2x80x9d of data (i.e., packets). These packets are routed through a packet network (e.g., the Internet) along with many other packets from other sources. Eventually, the packets from a given source are routed to the appropriate data destination where they are reassembled to provide a replica of the original stream of data.
Most packet-based telephony applications are for two-party conferences. Thus, the audio packet streams are simply routed between the two endpoints.
Some packet-based systems, such as those based on the H.323 protocol, may support conferences for more than two parties. H.323 is a protocol that defines how multimedia (audio, video and data) may be routed over a packet switched network (e.g., an IP network). The H.323 standard specifies which protocols may be used for the audio (e.g., G.711), video (e.g., H.261) and data (e.g., T.120). The standard also defines control (H.245) and signaling (H.225) protocols that may be used in an H.323 compliant system.
The H.323 standard defines several functional components as well. For example, an H.323-compliant terminal must contain an audio codec and support H.225 signaling. An H.323-compliant multipoint control unit, an H.323-compliant multipoint processor and an H.323-compliant multipoint controller provide functions related to multipoint conferences.
Through the use of these multipoint components, an H.323-based system may provide audio conferences. For example, the multipoint control unit provides the capability for two or more H.323 entities (e.g., terminals) to participate in a multipoint conference. The multipoint controller controls (e.g., provides capability negotiation) the terminals participating in a multipoint conference. The multipoint processor receives audio streams (e.g., G.711 streams) from the terminals participating in the conference and mixes these streams to produce a single audio signal that is broadcast to all of the terminals.
Traditionally, conferencing systems such as those discussed above do not identify the speaking party. Instead, the speaking party must identify himself or herself. Alternatively, the listening participants must determine who is speaking. Consequently, the participants may have difficulty identifying the speaking party. This is especially true when there are a large number of participants or when the participants are unfamiliar with one another. In view of the above, a need exists for a method of identifying speakers in a multi-party conference.
A multi-party conferencing method and system in accordance with our invention identify the participants who are speaking and send an identification of the speaking participants to the terminals of the participants in the conference. When more than one participant is speaking at the same moment in time, the method and system analyze the audio streams from the terminals and identify a terminal associated with a dominant party. When multiple participants are using the terminal associated with the dominant party, the method and system identify the speaking participant within the dominant party based on an indication received from the speaker.
In one embodiment, the system is implemented in an H.323-compliant telephony environment. A multipoint control unit controls the mixing of audio streams from H.323-compliant terminals and the broadcasting of an audio stream to the terminals. A speaker identifier service cooperates with the multipoint control unit to identify a speaker and to provide the identity of the speaker to the terminals.
Before commencing the conference, the participants register with the speaker identifier service. This involves identifying which terminal the participant is using, registering the participant""s name and, for those terminals that are used by more than one participant, identifying which speaker indication is associated with each participant.
During the conference, the multipoint processor in the multipoint control unit identifies the terminal associated with the dominant speaker and broadcasts the audio stream associated with that terminal to all of the terminals in the conference. In addition, the multipoint processor sends the dominant speaker terminal information to the speaker identifier service.
The speaker identifier service compares the dominant speaker terminal information with the speaker identification information that was previously registered to obtain the identification information for that speaker. If more than one speaker is associated with the dominant terminal, the speaker identifier service compares the speaker indication (provided it was sent by the actual speaker) with the speaker identification information that was previously registered. From this, the speaker identifier service obtains the identification information of the speaker who sent the speaker indication.
Once the speaker identification information has been obtained, the speaker identifier service sends this information to each of the terminals over a secondary channel. In response, the terminals display a representation of this information. Thus, each participant will have a visual indication of who is speaking during the course of the conference.