The present invention relates to a concept for determining information in order to temporally align two information signals, in particular a disturbed information signal and an undisturbed information signal, which may be employed, for example, for performing so-called objective measurements for evaluating the quality of signals.
Nowadays, standardized perception-based measurement techniques (perceptual measurements) are employed for metrologically assessing the quality of encoded audio or video signals. Known methods include, for example, the so-called PESQ (perceptual evaluation of speech quality) technique, which is described in the standardization document ITU-T P.862. Another known measurement technique for evaluating the quality of audio signals is the so-called. PEAQ (objective measurement of perceived audio quality) technique, and is represented in the standardization document ITU-RBS.1387-1. A measurement technique for evaluating video signals is described in A. P. Hekstra et al., “PVQM—A perceptual video quality measure”, in Signal Processing: Image Communications, 2002, Vol. 17, pp. 781-798, Elsevier.
These methods, or further methods, of evaluating the quality of audio or video signals have in common that a signal to be tested or to be evaluated, which typically is the output signal of a system or network or, generally, of an element to be examined, is compared to an original or reference signal, which typically is the signal input into the element to be examined.
In the past, test were performed using test persons in order to assess or evaluate a specific transmission technique or encoder. Depending on the application, these tests are auditory tests, for example, for testing hearing-adapted digital encoding techniques, or visual tests for testing digital video encoding techniques. Even though, on average, these tests provide relatively reliable results, there is nevertheless a subjective component. In addition, such subjective tests involving a specific number of test persons entail a relatively large amount of effort and are therefore relatively expensive. Therefore, objective measurement techniques for assessing the quality of encoded speech, audio or video signals have been developed.
Part of a setup of such an objective measurement technique is depicted in FIG. 7. The original signal or reference signal Sref(t), 104 is fed into a system 100 at a transmission characteristic H. A signal Sdeg(t), 102, which comprises signal properties or characteristics as compared to the original signal Sref(t) which have been modified by the system 100, is provided the output of the system 100. The first information signal Sdeg(t) and the second information signal Sref(t) are fed to a block 110 so as to temporally align or to temporally match the two signals to each other. In this manner it can be ensured that, for example with video signals, only those images or frames are compared to one another which temporally correspond to one another. Temporal alignment or the sequence of the two signals could be disturbed, for example, by a delay, a frame loss or a frame repetition. For quality evaluation of the disturbed or impaired signal Sdeg(t) it is important for the temporal alignment of Sref(t) to be performed with high accuracy and precision, since a subsequent comparison of two non-corresponding frames of Sdeg(t) and Sref(t) will generally lead to an underestimation of the video quality of the disturbed signal Sdeg(t). A correlation of such an objective quality evaluation to a subjective quality evaluation performed by, e.g., human viewers would be accordingly low.
Modern transmission techniques for, e.g., video, audio or speech signals frequently change the temporal structure of the information contained within a data stream. Sometimes this may be intentional, but more frequently this behavior is caused by transmission disturbances. Additionally, the signals are frequently disturbed by transmission and source encoding. Numerous applications, for example of metrology, involve a comparison of the transmitted signal Sdeg(t) with the undisturbed signal Sref(t). As was already described above, however, this comparison entails correct temporal association of the individual signal portions from the undisturbed signal Sref(t) and the disturbed signal Sdeg(t). With small disturbances and information streams structured in a relatively simple manner, such as speech signals, for example, simple techniques may be employed which are based on direct correlation of the two signals. With more complex signals, such as video signals, and high-level disturbances as occur, e.g., in mobile radio communication or internet telephony, said methods cannot be reliably applied and furthermore entail an extremely large amount of computing time.