Modern conferencing systems facilitate communications among multiple participants over telephone lines, Internet protocol (IP) networks, and other data networks. The use of conferencing systems is becoming more prevalent, especially as the cost of transmissions over IP networks has dropped. As usage has increased, the number of participants that attend a given conference has also increased. One consequence is that audio mixers must now be capable of processing a large number of Real-Time Protocol (RTP) audio packet streams from the various participants to a given conference. This increase in the number of packet streams input to the audio mixer (or bridge) results in an increase in the number of computations and processing steps that must be performed. The increased number of conference participants also increases the overall noise that is sent to the audio mixer.
Many conferencing mixers are configured to identify and mix only the loudest few speakers participating in discussions during a conference session. By discarding or ignoring all but the loudest streams, conference quality is improved due to the elimination of extraneous noise in the audio mix. In a typical secure conferencing application, however, the audio mixer is required to first decrypt the Secure Real-Time Protocol packet (SRTP) packets received, and then partially or fully decode all of the audio payloads of each incoming stream before determining the average power level of each stream. Even in a regular RTP application with no encryption, the average power level must still be computed. Once the streams have been decrypted, decoded, and the audio power levels determined, the mixer must then compare all of the audio streams to determine the loudest speakers. For relatively large conferences where numerous RTP streams are input to the audio mixer this is a highly compute-intensive process that can overwhelm the processing capacity and bandwidth of the audio mixer.