Communication technologies increasingly use computer equipment and computer networks for conveying voice and other audio content in real time between parties. For example, commercially-available computers, tablets, smart phones, and the like often include some form of audio chat, video chat, and/or web conferencing application, which is built in to the devices' operating systems and/or is readily available for download and installation on user devices. These communication technologies rely upon high-quality audio for their success.
Many factors can degrade audio quality in real-time communications, impairing user experience. For example, long network delays can cause latency or echo. Ambient and electronic noise can impair intelligibility. Dropped network packets can introduce pops, crackles, and robotic-sounding speech. Damaged or improperly placed microphones and speakers can cause distortion and insufficient volume.
Various approaches are known in the art for estimating audio quality in electronic communications. For example, the ETSI (European Telecommunications Standards Institute) has developed the E-model for estimating conversational quality from the mouth of a speaker to the ear of a listener over an electronic medium. The E-model includes terms that specify various impairments, e.g., delays, low bit-rate codecs (encoder/decoders), packet losses, and the like. Additional information about the E-model may be found in “G.107: The E-model: a computational model for use in transmission planning,” which may be found online at https://www.itu.int/rec/T-REC-G.107-201506-I/en. In addition, PESQ (Perceptual Evaluation of Speech Quality) provides a family of standards for automated assessment of speech quality as experienced by a user of a telephony system. PESQ is standardized as ITU-T recommendation P.862. Further, MOS (Mean Opinion Score) provides an assessment of audio quality based on scores provided by human subjects. Various forms of MOS are standardized as ITU-T recommendation P.800.1.