1. Field of the Invention
This invention relates to the assessment of an audio signal carrying speech. It is of particular application to the assessment of the condition of telecommunications systems whilst in use.
2. Related Art
Signals carried over telecommunications links can undergo considerable transformations. such as digitisation, data compression, data reduction, amplification, and so on. All of these processes can distort the signals. For example. in digitising a waveform whose amplitude is greater than the maximum digitisation value, the peaks of the waveform will be converted to a flat-topped form (a process known as peak clipping). This adds unwanted harmonics to the signal. Distortions can also be caused by electromagnetic interference from external sources.
The distortions introduced by the processes described above are non-linear, so that a simple test signal may not be distorted in the same way as a complex waveform such as speech, or at all. For a telecommunications link carrying data it is possible to test the link using all possible data characters (e.g. the two characters 1 and 0 for a binary link, or the twelve tone-pairs used in DTMF (dual tone multi-frequency) systems. However speech does not consist of a limited number of well-defined signal elements, but is a continuously varying signal, whose elements vary according to not only the content of the speech (and the language used) but also the physiological and psychological characteristics of the individual speaker, which affect characteristics such as pitch, volume, characteristic vowel sounds etc.
It is known to test telecommunications equipment by running test sequences using samples of speech. Comparison between the test sequence as modified by the equipment under test and the original test sequence can be used to identify distortion introduced by the equipment under test. For example, Edmund Quincy, in the IEEE International Conference on Communications 87; Session 33.3; vol 2 (pages 1164-1171) describes such a method of analysing such a signalm using a "rule-based" system (also known as an "expert" system), in which predetermined objective rules are used to generate, for a given input signal, an appropriate output indicative of the quality of the signal.
The arrangement described above requires the use of a pre-arranged test sequence, which means it cannot be used on a live telecommunications link--that is, a link currently in use for revenue-earning traffic--because the test sequence would interfere with the traffic being carried and be audible to the users, and because conversely the live traffic itself (whose content cannot be predetermined) would be detected by the test equipment as distortion of the test signal.
In order to carry out tests on equipment in use, without interfering with the signals being carried by the equipment (so-called non-intrusive testing), it is desirable to carry out the tests using the live speech signals themselves as the test signals. However, a problem with using live speech as the zest signal is that there is no instantaneous way of obtaining, at the point of measurement, a sample of the original signal. Any means by which the original signal might be transmitted to the measurement location would be likely to be subject to similar distortions as the link under test.
The present Applicant's co-pending International Patent applications WO96/06495 and WO96/06496 (both published on 29th February 1996, and U.S. Pat. No. 5,940,792) propose two possible solutions to this problem. WO96/06495 (now also U.S. application Ser. No. 08/765,697) describes the analysis of certain characteristics of speech which are talker-independent in order to determine how the signal has been modified by the telecommunications link. It also describes the analysis of certain characteristics of speech which vary in relation to other characteristics, not themselves directly measurable, in a way which is consistent between individual talkers, and which may therefore be used to derive information about these other characteristics. For example, the spectral content of an unvoiced fricative varies with volume (amplitude), but in a manner which is largely independent of the individual talker. The spectral content can thus be used to estimate the original signal amplitude, which can be compared with the received signal amplitude to estimate the attenuation between the talker and the measurement point.
In WO96/06496, the content of a received signal is analysed by a speech recogniser and the results of this analysis are processed by a speech synthesiser to regenerate a speech signal having no distortions. The signal is normalised in pitch and duration to generate an estimate of the original speech signal which can be compared with the received speech signal to identify any distortions or interference, e.g. using perceptual analysis techniques as described in International Patent Applications WO94/00922 and WO95/15035 (issued as U.S. Pat. Nos. 5,621,854 and 5,794,188, respectively).
Typically speech transmission over a limited bandwidth employs data reduction. Linear predictive codecs (LPCs) are based on an approximation to the human vocal tract and represent segments of speech waveform as the parameters required to excite equivalent behaviour in a "vocal tract model". For many applications the speech content of a signal can be analysed by identifying parameters of the speech In such a vocal tract model. However. such models cannot model elements which were not generated in the vocal tract. Consequently. conventional vocal tract models cannot readily analyse distortions.