In VoIP communication systems, resultant speech quality may be adversely affected by many types of noise. However, most research in this area has been directed at stationary or near-stationary noise, and little attention has been paid to impulsive (i.e., impulse-like) noise. Although current models for measuring speech quality predict degradation due to stationary or near-stationary noise with acceptable accuracy, the accuracy of such models for speech corrupted by impulsive noise has not been addressed. As used herein, impulsive (or impulse-like) noise comprises the noise which results from the corruption of an isolated speech sample or of a small number of successive speech samples within the speech signal.
Speech quality assessment can be divided into two categories:
(1) double-ended (or intrusive) measurements, whereby a reference signal is passed through the transmission channel and the received signal is subsequently compared to the reference signal, and
(2) single-ended (or non-intrusive) measurements, whereby only the received signal is accessible and used for assessment of the speech quality.
The most prominent methods for objective speech quality assessment are embodied in certain standards (i.e., “Recommendations”) promulgated by the International Telecommunications Union, in particular, ITU-T Recommendation P.862, a double-ended measurement method, and ITU-T Recommendation P.563, its single-ended counterpart, each of which is fully familiar to those of ordinary skill in the art. In addition, at least one method for non-intrusive measurement of impulsive noise in telephone-type networks has previously been proposed, but that particular method assesses the presence of impulsive noise only during speech pauses (i.e., portions which do not include speech), and thus cannot be used during speech activity.
To monitor real-time voice traffic, VoIP service providers typically run a single-ended speech quality assessment technique, such as, for example, ITU-T Recommendation P.563, that provides not only an overall value for predicted speech quality—typically represented by a “Mean Opinion Score” (MOS) value on a scale from 1 to 5 (representing bad to excellent speech quality)—but also detailed statistics of speech quality and accompanying noise. (The use of Mean Opinion Scores is fully familiar to those of ordinary skill in the art.) For example, ITU-T Recommendation P.563 assesses local and global background noise, among others, but it does not measure, nor even detect, the presence of impulsive noise (e.g., the corruption of an isolated speech sample or of a small number of successive speech samples), even though such noise can severely bias speech quality results. In fact, certain experiments have shown that ITU-T Recommendation P.563 often actually gives a higher MOS score (indicating better speech quality) in the presence of impulsive noise, than in its absence—a result which is clearly inconsistent with its underlying purpose. In fact, human listeners will invariably find the presence of such impulsive noise extremely disturbing, despite ITU-T Recommendation P.563's failure to properly measure its presence. Therefore, what is needed is a speech quality assessment technique that detects and measures the presence of impulsive noise during speech activity in a received speech signal, for use in speech quality assessment within a speech communications system.