Embodiments of the present invention generally relate to telecommunications and more specifically to techniques for optimizing voice quality for a media stream in a conference.
In a conference, participants may use end-points that send media using different compressions, such as G.729, G.723, etc. The various media streams from the participants are mixed and sent to each participant in a conference. Mixers may use algorithms to select the media to be mixed. For example, mixers may mix the top speakers or last N speakers, etc.
The mixer typically mixes the various media streams in the linear pulse code modulation (PCM) domain, i.e., G.711. Thus, if a media stream from a user is compressed using a G.729 CODEC, the G.729 media stream, when received for mixing, is transcoded to G.711. The G.711 media stream is then mixed with other media streams that have been transcoded to G.711, and then the mixed stream is transcoded back to G.729 (or any other compression). The G.729 mixed media stream is subsequently sent to the participants.
The transcoding from G.729 to G.711 and back to G.729 degrades the voice quality. For example, a mean opinion score (MOS) may be reduced. The MOS provides a numerical indication of the perceived quality of received voice. Typically, the MOS is expressed as a number between a range of 1 to 5, where 1 is the lowest perceived quality and 5 is the highest perceived quality. For example, G.729 may have a MOS of 3.92. However, when the media stream is transcoded from G.729 to G.711 and back to G.729, the MOS may be 3.27. Thus, the MOS for the transcoding is 0.65 less than the G.729 media stream without the transcoding. On average, most humans can notice a 0.2 difference in MOSs. Thus, this difference is 3 times the normal noticeable degradation that humans can perceive in voice quality. Accordingly, the degradation in voice quality in a conference where media streams are transcoded to another compression may be very noticeable to participants.