1. Field of the Invention
The present invention relates to multipoint conference systems, and, more particularly, to a method for selecting and switching the primary transmission site in a multipoint conference system based on voiced audio level.
2. Description of the Related Art
A multipoint conference environment typically includes a plurality of conference sites which are geographically separated but electronically linked together to enhance collaboration between and among individuals at the various conference sites. A multipoint conference system attempts to replicate the interpersonal communication and information sharing which would occur if all the participants were together in the same room at the same time. Such a multipoint conference system typically processes conference information (e.g., audio, video and/or data information) communicated between the conference sites during a multipoint conference. With respect to the audio signals, the multipoint conference system can analyze audio signals received from conference equipment located at the conference sites to determine whether the sites are in a "talking" or "listening" state (e.g., whether a speaker at one site is attempting to communicate information to other sites or whether the participants at the one site are listening for communication from the other sites). Specifically, when a multipoint videoconference system determines that a unique site is in a "talking" state, that site becomes the video source for the remaining conference sites.
As used herein, the site that is selected to be the video source for the remaining conference sites is called the primary transmission site. Although other sites may be transmitting video information, the video information transmitted from the primary transmission site is viewed at other sites. A multipoint videoconference system may display simultaneous views of multiple sites on a screen while identifying a "talking" site to manage the screen views. The selection of a primary transmission site from among a plurality of conference sites is called switching. The automatic selection of a primary transmission site according to audio levels received from the plurality of conference sites is referred to herein as sound-activated switching.
Because the microphones of conventional multipoint conference systems do not discriminate human voice from other sounds, the primary transmission site is typically selected based on the amplitude of sound detected by the microphones without regard to the type of sound detected by the microphones. Although much of the prior art uses the term "talking" and often refers to "voice-activated" switching, the terms "talking" and "voice" in the prior art typically refer to detected sound level at a particular input device without regard to whether the sound is actually talking or is in reality background noise.
For example, conventional multipoint conference systems determine talk and listen states depending on the sound level received from each station. Thus, although the selection of a primary transmission site according to such a "talk/listen" determination is often referred to as "voice-activated" switching in the prior art, such switching may be more accurately described as sound-activated switching according to a loud/quiet determination. Sound-activated switching provides a useful but limited approximation of actual voice-activated switching.
Another limited approximation to actual voice-activated switching is the use of a circuit or method to prevent a short duration audio signal above a certain threshold from switching the primary transmission site from the site of the speaker to the site of the short duration audio signal (e.g., a cough delay). Again, although such a circuit or method may be referred to as voice-activated switching, such a circuit is really a limited approximation of the behavior of an actual voice-activated switching method. Such a circuit or method is limited in that relatively long term but non-voiced sounds may switch the primary transmission site to an incorrect conference site. Furthermore, legitimate video switching may be delayed by such a circuit or method.
The audio signals received by a control unit of a multipoint conference system can vary greatly in volume and ambient noise depending on, for example, the conference room, conference equipment and/or audio compression algorithms used. Also, background noises such as computer keystrokes, the rustling of papers, the sounds of eating during a lunch conference, coughing, sneezing, and/or the opening and closing of doors often trigger a switch of the primary transmission site from the site of the speaker to the site of the background noises. Air conditioner fan noises and/or other continuous machine noises can also cause erroneous switching of the transmission site. When background noises are coupled with variations in speaker volume, the effectiveness of a multipoint conference system using sound-activated switching can be substantially degraded.