Video conferencing allows conference participants who are at different locations to participate in a conference. Typically, each conference participant has a computer-based video conferencing system that includes a video camera, a microphone, a display device, and a speaker. The video conferencing system of a conference participant captures the video and audio of that conference participant using the video camera and microphone and transmits the video and audio to the video conferencing systems of the other conference participants. When a video conferencing system receives the video and audio from the other conference participants, it presents the video on the display device and outputs the audio to the speaker. A video conferencing system may display each video in a different window on the display device or in a different area of a window. Thus, the conference participants can view the video and hear the audio of the other conference participants.
In order to send and receive content, the computer systems of the conference participants need to be connected. The computer systems can be directly or indirectly connected to each other. The connection types of a conference can be classified based on whether the connections are direct or indirect as “full mesh” and “full mix.” In a full mesh conference, each participant computer system has a direct connection to each other participant computer system. Thus, each participant computer system transmits its video and audio directly to each other participant computer system, and each participant computer system receives video and audio directly from each other participant computer system. In a full mix conference, each participant computer system only has a direct connection to a distinguished or hub computer system. Each participant computer system transmits its video and audio directly to the hub computer system, which in turn transmits the video and audio it receives from one participant computer system to the other participant computer systems. Thus, each participant computer system only needs to establish a connection to the hub computer system and is indirectly connected to the other participant computer systems.
To support a video and audio conference, each connection has a video channel and an audio channel. Each channel includes a send stream and a receive stream for sending and receiving content of the channel. Each endpoint of a channel includes a source and a sink that are connected to the streams of the channel. For example, the source and the sink of an audio channel are microphones and speakers, respectively. For both streams of an audio channel, a microphone is connected at one endpoint and a speaker is connected at the other endpoint.
Each endpoint of each stream of a channel may have a media stack of components that implement the functions of the stream. The components of the media stack of an audio channel for a source may receive audio content in PCM format, convert the audio content from PCM format to G.722 format, packetize the audio content that is in the G.722 format, and transmit the packetized content to the other endpoints. The components of a media stack of an audio channel for a sink may receive packetized audio content from the other endpoints, de-packetize the received content, convert the de-packetized content from G.722 format to PCM format, and provide the content in PCM format to the local sink.
To conduct a conference, the content generated by each video and audio source needs to be routed to each video and audio sink. In a full mesh conference, each participant computer system receives content from the other participant computer systems and mixes the content so that it can be presented to the participant. For example, the video received from the other participant computer systems may be mixed by simultaneously displaying each video in a separate area of a window. The audio received from the other participant computer systems may be mixed by taking the average of the audio samples provided by the participant computer systems for each sampling time. In a full mix conference, the hub computer system may perform the mixing and then send the mixed content to each of the participant computer systems.
The routing and mixing of content can be complex when a conference includes many participants and the sources and sinks of various endpoints can be connected in different ways. For example, a certain sink at one endpoint may want to receive and mix content from only some of the sources. It would be desirable to have a technique that would allow for the efficient routing and mixing of content in such situations.