1. Field of the Invention
The present invention relates generally to computer-based telephony networks and more particularly to servers that manage telephony conferencing.
2. Related Art
In today's technological environment, there exists many ways for several people who are in multiple geographic locations to communicate with one another simultaneously. One such way is audio conferencing. Audio conferencing applications serve both the needs of business users (e.g., national sales force meeting) and leisure users (e.g., audio chat room participants) who are geographically distributed.
Traditional audio conferencing involved a central conferencing server which hosted an audio conference. Participants would use their telephones and dial in to the conferencing server over the Public Service Telephone Network (PSTN) (also called the Plain Old Telephone System (POTS)).
In recent years, the possibility of transmitting voice (i.e., audio) over the worldwide public Internet has been recognized. As will be appreciated by those skilled in the relevant art(s), the connectivity achieved by the Internet is based upon a common protocol suite utilized by those computers connecting to it. Part of the common protocol suite is the Internet Protocol (IP), defined in Internet Standard (STD) 5, Request for Comments (RFC) 791 (Internet Architecture Board). IP is a network-level, packet (i.e., a unit of transmitted data) switching protocol.
Transmitting voice over IP (VoIP) began with computer scientists experimenting with exchanging voice using personal computers (PCs) equipped with microphones, speakers, and sound cards. VoIP has further developed with the adoption of the H.323 Internet Telephony Standard, developed by the International Telecommunications Union-Telecommunications sector (ITU-T), and the Session Initiation Protocol (SIP), developed within the Internet Engineering Task Force (IETF) Multiparty Multimedia Session Control (MMUSIC) Working Group.
Conferencing servers (also called multipoint control units (MCUs)) were developed to host audio conferences where participants are connected to a central MCU using PC-based equipment and the Internet, or using a telephone through a gateway, rather than traditional telephone equipment over the PSTN.
One common problem, however, exists in both MCUs that support Internet-based telephony and conferencing servers that support traditional PSTN-based telephony. This problem is now described (with conferencing servers and MCUs being referred to generally herein as MCUs).
MCUs, in general, enable multipoint communications between two or more participants in a voice conference. An MCU may support many conferences at one time, each of which have many participants. Each participant in a given conference will hear a mix of up to n active speakers, except for the active speakers themselves, who hear the mix minus themselves (this is, in essence, an “echo suppression” function so that a party will not “hear themselves speak” during the audio conference). For ease of explanation herein, and as will be appreciated by those skilled in the relevant art(s), the module in an MCU that does the active speaker detection, mixing or multiplexing, switching and streaming of the audio is referred to herein as the “Mixer.”
In the case where the Mixer needs to do mixing of multiple audio streams or accept different packet sizes from different participants, the Mixer needs a buffer (i.e., a memory storage area) in which to receive audio data. This buffer may be large if it also needs to accommodate jitter (the random variation in the delivery time) in packet arrival times. From a memory standpoint, it would be most efficient to assign buffers only to the active speakers rather than to all participants in a conference, and to reassign the buffers as the active speakers change. However, there is a drawback to only collecting data for the active speakers. Often times, the active speaker update event within a Mixer does not detect a new active speaker until enough “loud” packets have gone by to trigger the selection of the speaker as a new active speaker. This can cause the first word to be partially lost in the new active speaker's. audio stream.
Therefore, given the above, what is needed is a method and computer program product for the efficient allocation of buffers for current and predicted active speakers in voice conferencing systems.