With video coding technologies, it is often desired to compress a video sequence into a coded video sequence. The video sequence may for example have been captured by a video camera. A purpose of compressing the video sequence is to reduce a size, e.g. in bits, of the video sequence. In this manner, the coded video sequence will require smaller memory when stored and/or less bandwidth when transmitted from e.g. the video camera. A so called encoder is often used to perform compression, or encoding, of the video sequence. Hence, the video camera may comprise the encoder. The coded video sequence may be transmitted from the video camera to a display device, such as a television set (TV) or the like. In order for the TV to be able to decompress, or decode, the coded video sequence, it may comprise a so called decoder. This means that the decoder is used to decode the received coded video sequence. In other scenarios, the encoder may be comprised in a network node of a cellular communication system and the decoder may be comprised in a wireless device, such as a cellular phone or the like, and vice versa.
A known video coding technology is called High Efficiency Video Coding (HEVC), which is a new video coding standard, currently being developed by Joint Collaborative Team-Video Coding (JCT-VC). JCT-VC is a collaborative project between Moving Pictures Expert Group (MPEG) and International Telecommunication Union's Telecommunication Standardization Sector (ITU-T).
Furthermore, VP8 is a known proprietary video coding technology.
Common for the above two video coding technologies is that they use a previously decoded picture for reference when decoding a current picture and that the encoder and decoder will keep bit-exact versions of some of these decoded pictures, called reference pictures, so as to avoid that any difference occurs between the encoder and the decoder, a.k.a. drift.
Pictures that do not use previously decoded pictures for reference are called intra pictures. Intra pictures can be used to reset the state of the decoder for example in order to recover from errors that have occurred in previously decoded pictures. In HEVC, intra pictures that do not allow pictures that follow the intra picture in output order to be reference pictures before the intra picture are called IRAP (Intra Random Access Point) pictures.
An HEVC bitstream, e.g. in the form of a Coded Video Sequence (CVS), includes one or more Network Abstraction Layer (NAL) units. A picture may be included in one or more NAL units. A bitstream comprises of a sequence of concatenated NAL units. A NAL unit type is transmitted in the nal_unit_type codeword in a NAL unit header, 2 bytes in HEVC. The NAL unit type indicates how the NAL unit should be parsed and decoded. There exist two types of NAL units: VCL (video coding layer) NAL units and non-VCL NAL units. One example of non-VCL NAL units is Supplemental Enhancement Information (SEI) messages. These messages contain meta data that does not affect the actual decoding process of video data. A particular SEI message is called Decoded Picture Hash (DPH) SEI message, which is specified in HEVC specification, section D.2.19 and D.3.19. The DPH SEI message defines three methods for calculating a hash, or DPH value, over a decoded picture, which in the encoder is also referred to as the reconstructed picture: Message-Digest algorithm 5 (MD5), Cyclic Redundancy Check (CRC) and checksum. Which of them is used is indicated by the syntax element hash_type of the DPH SEI message.
Video Transport is done in various ways; two commonly used methods are Real-Time Transport Protocol (RTP) for real-time, often interactive communication, and Hypertext Transfer Protocol (HTTP) based video streaming for content services, serving video on demand. The real-time ways will be discussed first and then the HTTP based video streaming ways.
The RTP/Real-Time Control Protocol (RTCP), see Request For Comments (RFC) 3550, is a real-time media transport framework that provides various tools suitable for dealing with low-delay interactive video communication. These tools include feedback explicitly related to video coding and used to influence the video encoder. Below we shortly discuss some of these existing tools:
Reference Picture Selection Indication (RPSI), see RFC4585, is used by a receiver to indicate to a video encoder, please use this video picture as reference picture when encoding the next picture. It is used in the HEVC RTP payload format for the purpose of selecting another reference picture when a picture has been detected as erroneous. The VP8 RTP payload format uses RPSI also in an acknowledgment (ACK) mode, where it ACKs the reception of VP8′s “golden” pictures, i.e. key reference pictures.
Slice Loss Indication (SLI), see RFC4585, is used, by the receiver, to report that a part of the video picture's encoded data, i.e. a slice is missing, or lost. Thus enabling the video encoder to either repair this loss immediately or take the video decoders concealment into account when encoding the next video picture.
Picture Loss Indication (PLI), see RFC4585, is used, by the receiver, to inform that it is has an error or is missing an unspecified part of the video picture. The encoder is expected to as timely as possible to repair this, by its choice of method.
Full Intra Request (FIR), see RFC5104, is a request to receive a video picture (intra) that isn't dependent on any previous state, i.e. a picture without any inter picture dependencies.
An RTP payload format is a specification for how a particular real-time media encoder is packetized within RTP.
An example of a HTTP based streaming media method is progressive download or adaptive streaming, such as Dynamic Adaptive Streaming over HTTP (DASH). As an example, a client, e.g. using JavaScript running in a browser, requests to download a video file. In progressive download, the client starts decoding the video file prior to having downloaded the entire video file. Adaptive streaming is similar and is either range based, or chunk based. In a range based case, the client continues to download a file as long as the adaptation logic doesn't have it change which representation, i.e. encoding, of the media content it should retrieve. When changing representation, it uses meta information to calculate the byte offset into the file with the representation it needs to continue to download. In a chunk based case, the entire video file is partioned into chunks, typically a few seconds, e.g. 10 s. Each chunk is encoded into different representations and each representation of the chunk is commonly stored in a separate file. By requesting different representations of the chunks, the adaptive client can change the bit-rate as well as other variations, like codecs that there exist encoded representations for.
Thus, in a first scenario, a video system uses RTP for transfer of video data between one or more encoder(s) and one or more decoder(s) of the system. The encoder(s) and decoder(s) are assumed to be compliant and compatible with e.g. a HEVC standard specification. However, errors are prone to happen. A reason for that is the high demands on compression efficiency of modern codecs, which thus allow for very little redundancy.
The errors in video decoding have two major sources.
The first source is data errors in the encoded video data feed as input to the decoder. These errors can be introduced at any stage between the encoder's output and the decoder's input. Transport over an IP-network, like the Internet is one potential cause for these errors, primarily when the data's integrity isn't verified by the receiver. These errors occur even over “reliable” transports like TCP that has some (but weak) error detection capabilities. Other causes for errors are the hardware, where memory or storage without strong verification can introduce bit-errors or other modifications.
The second source for errors is errors in the encoding process or in the decoding process. These are not supposed to be there, but both an encoder and a decoder is an implementation of a standard. This means that there may be errors in both the interpretation of the standard, as well as in the implementation.
Some of the errors do result in that the decoder itself detects them, for example a value that is not allowed, or output values that are out-of-bounds. This indicates to the decoder that the decoding of this video picture failed to some degree. However, in other cases the decoder does not detect these, and outputs a video picture and continues to use the corrupted state to decode further images. These later errors thus can pass the decoding and significantly lower the video quality for significant durations.
HEVC does provide a tool that if used allows the decoder to detect if the decoded video picture includes errors or matches what the encoder intended it to be, i.e. the DPH SEI message. When the error has been detected by means of the DPH SEI message, a problem may be how to improve performance of the video system.
Systems using video is frequently monitored so that the operator of the system can verify its function and detect faults or issues. This monitoring is done using a large set of different tools, some standardized, some proprietary. The fundamental function is that clients or servers monitor key performance indicators (KPI). Each KPI is one or several properties that can be measured and for which snapshot of values, or statistical processed summary values can be determined over intervals or usage session. This statistical processing includes averages and standard deviation. For an RTP based system some common KPI includes packet loss rate, burst loss sizes, round-trip times.
These values are then usually gathered and stored in a database to allow follow up of communication sessions that indicates sub-standard KPI values. This gathering can be done in various ways, such as Management Information Base, WebRTC JavaScript Statistics API, in central servers or network functions, such as RTP mixers or Session Border Gateways.
In systems using RTP and RTCP, there exists a number of KPIs that can be provided over RTCP to the peer or RTP mixer/translator. The basic things are included in the RTCP Receiver Report Block, see RFC3550, while more detailed statistics can be provided using RTCP Extended Reports, see RFC3611, which has extensible model for additional performance indicators.
Thus, in a second scenario, a video system comprises one or more encoders and one or more decoders. It is desired to test and verify compatibility of the video system with respect to a certain standard specification, such as HEVC mentioned above. This kind of testing and verification is very time consuming and costly, since large amounts of input needs to be run through both the encoder(s) and decoder(s). The testing is thus highly dependent on the input, and based on the enormous amount of possible permutations of video input, state, introduced errors etc. is very difficult, or even almost impossible, to fully verify the video system to be compatible and compliant to 100%. A problem may hence be how to reduce time and cost for testing and verification of compatibility.
The capabilities of both encoder and the decoders are commonly signalled to let the counter part know capabilities or request or require particular features to be enabled in the counter part as well as the media transport. For RTP based systems the most commonly used signalling is based on Session Description Protocol (SDP), see RFC 4566. The SDP may be part of Session Initiation Protocol (SIP) messages, see RFC 3261.