In modern transmission technology, specific encoding methods for data reduction are available for transmitting audio and video signals. Said specific encoding methods are employed so as to provide the end user with the best possible quality as a function of the current capacity of the transmission channel.
FIG. 11 shows a typical transmission link to an end user as is currently used. It includes a database storing all kinds of media data to be transmitted, a streaming server undertaking delivery of the data over a network, the network itself, as well as the end user/client receiving the desired data. The media data may be a video, for example. A frequently asked question relates to the actual quality of the video perceived by the end user. It has turned out that the increasing proportion of adaptive video streaming techniques such as DASH (Dynamic Adaptive Streaming over http) or HLS (HTTP Live Streaming), for example, wherein the media data is provided in different quality levels on the server side may use a stable, or uniform, quality measure (quality metrics) that can assess different frame sizes and/or quality levels in a manner that is standardized m to the largest possible extent in order to evaluate the quality of the media contents that is perceived at the receiver's side.
Generally, quality concepts may be divided into three categories:
So called full-reference (FR) quality measurement techniques compare the original media content that is not degraded by compression to the media content whose quality is to be determined. What is disadvantageous here is the necessity to access the original version of the media content. So-called no-reference (NR) quality measurement techniques determine the quality exclusively on the basis of the media content received, or of the data stream that has been received and represents said media content. Possibly, this involves only detecting transmission artifacts and quantifying them for determining the quality measure. So-called reduced-referenced (RR) quality measurement techniques represent a kind of intermediate solution between the FR and NR techniques in that they do not exclusively use the received data stream, or the received media content, for determining the receiver-side quality, but wherein intermediate results determined in real time on the transmitter side contribute to determining the receiver-side quality. Said parameters or intermediate results are typically co-transmitted (co-coded) in the media data stream transmitted.
In particular in mobile applications, FR quality measurement techniques can hardly be implemented. A solution to this problem is described in US 2009/0153668 A1. On the transmitter side, quality analysis results received on the transmitter side are inserted into the transmitted data stream such as into the RTP extension header, for example, said quality analysis results typically being the result of an FR analysis of the media content transmitted. On the receiver side, a verification is performed to determine whether the transmitted media data stream has been transmitted in a manner free from artifacts. In phases during which this is the case, the quality information transmitted within the data stream itself are used for determining the received quality. In phases during which faulty transmission has occurred, i.e. where transmission artifacts have occurred, a quality estimation is performed at the receiver's side. Eventually, the receiver-side quality is derived from a combination of both quality measurements, i.e. of that obtained on the basis of the quality information transmitted during interference-free phases, and that which has been estimated at the receiver's side during interference-prone phases. Even though this approach results in that the reference media content need not be present at the receiver's side in order to apply an FR method, the method presented in the above-cited reference is disadvantageous in many respects and is not suitable for finding a satisfying solution in adaptive-streaming methods. Adaptive-streaming methods provide the individual clients with the media content in varying levels of quality. Naturally, the quality varies to a different degree for each client, depending on which bandwidth is currently available to said client. However, in order to provide varying qualities for a multitude of clients at the same time, adaptive-streaming methods typically resort to precoded data. For example, a video is divided into time slots, and precoded versions with highly diverse quality levels are created for each time slot. A predetermined protocol enables the clients to load the video in varying quality levels by switching between the individual quality levels at the time slot borders. Said time slots may have lengths of two to four seconds, for example, and are sometimes also referred to as chunks. However, FR quality measurement techniques such as ITU-T J.247, for example, that have been adapted to subjective tests, may use a duration that is longer than the chunk duration, i.e. that extends over several chunks, for determining the quality. Thus, in order to realize the method described in the US reference, a transmitter-side quality measurement would have to be specifically performed for each client on the transmitter side and be made available to the receiver side by means of extension headers, namely online and/or in real time for all clients. However, for many applications having many clients that are present at the same time, such an approach is unfeasible on account of its large expenditure in terms of time and energy.
US 2012/0 278 441 A1 describes a method of estimating the quality at the receiver's side, i.e. the quality actually perceived by the end user. One advantage indicated for said method is that it is said to consume only little computing power at the receiver's side and to be able to be performed at any point in time. In this manner, it is also possible, specifically, to use the receiver-side measurements for influencing the transfer of media data. The method proposed in the US document starts by providing the media content in different levels of quality on the transmitter side. If need be, a signature which represents the media content is created on the transmitter side, said signature depending more or less on the entire picture content and being more or less representative for the picture content. Said signature is transmitted, along with the picture content, to the receiver side such that at least the signature is received in a manner free from artifacts. On the receiver side, a signature is then produced, in the same manner, from the media content obtained and is subsequently compared to the signature also transmitted from the transmitter side so as to obtain a quality value QoE from the comparison. For mapping the comparison to the QoE value, a classification function is used which is either continuously trained and/or known in advance. The QoE value now indicates the quality at the receiver's side, for example in the categories of “excellent”, “good”, “adequate”, and “poor”. The QoE value is said to be able to be transmitted back to the transmitter side from the receiver side so as to be used by the media server there, so as to adapt the quality actually obtained at the receiver's side to that quality that is actually expected by means of measures such as re-routing the transmission path, changing the playing quality, or the like.