A conventional method of evaluating the performance of the equipment employed for speech-signal transmission consists, as far as possible, in objective measurements, carried out without human speakers or listeners.
The results of subjective measurements, performed with human speakers and/or listeners depend too much on the type of voice, on the speaker and/or listener and even on the text utilized for the test; results sufficiently reliable might be obtained only by utilizing a great number of speakers and/or listeners and texts of a certain length, which would make the tests long and hence costly.
In general, the procedure for performing objective measurements consists in sending into the apparatus to be tested a suitable input signal, and in calculating, at the output of the system, the signal-to-noise ratio for the received or reconstructed signal, evaluated as the ratio between input-signal power and error-signal power (the error signal may be defined as the difference between input and output signals). The higher the ratio, the better the evaluated system quality.
The input signals most frequently used are sinusoidal signals of various frequencies, in the range of 800 to 1000 Hz, or white gaussian or laplacian noise, because these signals may be processed easily and so they are particularly useful for tests carried out through simulation techniques.
The use of signals of this kind whose spectral and amplitude characteristics are not those of vocal signals, however, may entail considerable difference between objective and subjective performance evaluations, i.e. measurements obtained with a real listener receiving real speech signals.
The difference between objective and subjective measurements is greater in digital transmission systems; recent studies demonstrated that in digital transmission systems the simple signal-to-noise ratio is no longer a parameter sufficiently meaningful, but it is necessary to distinguish at least between quantization-noise effects and the effects of the distortion due to amplitude overload (or slope in the case of differential systems), also taking into account the relative magnitudes of these two factors. However, owing to their statistical characteristics, neither white noise nor a sinusoidal signal allows to distinguish exactly between the two above-cited noise components, as is easy to demonstrate and has been experimentally verified.
On the other hand it is not feasible to employ for quality tests an artificial signal obtained by voice synthesis, since such artificial signal would present all the inconveniences inherent in the use of a real signal, i.e. a dependency not only on the synthesis method, but also on the speaker, the text, the language; furthermore, signal generation by voice synthesis is a very complex and delicate process.