1. Field of the Invention
This invention generally relates to methods and apparatus for detection of certain events in signals and particularly to a continuous adaptation and a compensation mechanism such that untreated distortions propagating through the detection system are mitigated.
2. Description of Background
Currently, detection systems generally consist of a multitude of components whose precise specification depends upon the nature of the detection problem. The task of detection involves an automatic verification of a hypothesis imposed on the contents of an observed signal with respect to a reference signal. For example, given an excerpt of a speech recording (signal), a hypothesis might be as follows: “the excerpt is spoken in German,” where the class German is represented by a reference recording (reference signal), in other words, two input signals are examined under the hypothesis that they contain the same relevant information; hence the example can he reworded as “is the test excerpt spoken in the same language as the reference recording?” There are two possible outcomes in any detection task, namely “acceptance” or “rejection” of the hypothesis.
Detection systems in real-world application race a variety of challenges. A major challenge that is the subject of interest in the present invention is the mismatch due to variable noise conditions. Due to various real-world phenomena the incoming signals are distorted by noise to a greater or lesser degree. Besides the fact that the noise has an adverse Impact on the processing of the particular signal, the difference between the noise from one signal to another (i.e. noise causing mismatch) is just as problematic to deal with. For instance, in the above example, the reference speech recording (for German) might have been recorded using a landline telephone apparatus with relatively little background noise; but the test excerpt might have been recorded over a cellular telephone network from an acoustically noisy environment. In that case the mismatch between these two recording conditions causes a considerable problem in comparing the two signals. Mismatched conditions have been identified as one of the major challenges for research in pattern recognition and detection, in the example of speaker detection.
There are a variety of techniques that address the effects of noise, distortions, and mismatch between the test and the reference signal in detection technology (e.g. in speaker detection. These may be categorized according to the component in the system upon which they act, e.g. in which functional block (see FIG. 1) their effect applies: 1) feature extraction level (e.g. by transforming the features using a non-linear transform to mitigate mismatch), 2) modeling level (e.g. by transforming model parameters to reduce variations caused by mismatch, 3) matcher (score) level.
In spite of the various techniques addressing linear and non-linear distortions, a certain (and typically considerable) degree of residual distortions remain in the processing pipeline due to unpredictable conditions and as such propagate through the system. Their effect is reflected in an undesirable distortion in the resulting test score (Matcher 13 level). The distortion is in general non-linear. This distortion is viewed as a stochastic process.
In most practical systems it desirable to maintain a single common decision threshold that is applied on the matcher score. However, distortions (viewed here as a stochastic process) cause a change in the overall score distribution—in the simplest ease causing a shift or, in the complex case, causing reshaping of the distribution which results in the threshold to lie off its correct operating point thus leading to an increase in error rates.