In modern communications systems a video call may be conducted over a medium such as a wired and/or wireless network, for instance which may comprise a packet-based network such as the Internet. The call comprises at least one video stream being transmitted from one user terminal to another, and often a video stream in both directions. The two terminals establish a communication channel between one another over the network or other medium, allowing frames of video data captured by a camera at the transmit side to be transmitted to the receive side over the channel. The frames of the video are typically encoded by an encoder on the transmitting terminal in order to compress them for transmission over the channel. A corresponding decoder at the receiving terminal decodes the frames of the received video in order to decompress them for output to a screen. A generic term that may be used to refer to an encoder and/or decoder is a codec.
The encoding commonly comprises prediction coding in the form of intra-frame prediction coding, inter-frame prediction coding, or more usually a combination of the two (e.g. an occasional intra-frame encoded “key” frames interleaved between sequences of inter-frame encoded frames). According to intra-frame encoding, blocks are encoded relative to other blocks in the same frame. In this case a target block is encoded in terms of a difference (the residual) between that block and another block in the same frame, e.g. a neighbouring block. The residual is smaller than an absolute value and so requires fewer bits to encode, and the smaller the residual the fewer bits are incurred in the encoding. According to inter-frame encoding, blocks in the target frame are encoded relative to corresponding portions in a preceding frame, typically based on motion prediction. In this case a target block is encoded in terms of a motion vector identifying an offset between the block and the corresponding portion from which it is to be predicted, and a difference (the residual) between the block and the corresponding portion from which it is predicted. Inter-frame encoding usually results in an even smaller residual than intra-frame encoding, and hence incurs even fewer bits.
A video call is by its nature a “live” communication. That is, an outgoing video stream of the call continues to be captured in real-time at the transmitted terminal even while other, previously-encoded data of that same stream is received and played out at the receiving terminal (as opposed to a whole video file being encoded in one go and then subsequently transmitted). “Live” or “real-time” as used herein do not necessarily limit to zero delay. Nonetheless, the user does expect the video to be encoded, transmitted and decoded (on average) at least as quickly as the event being captured actually occurs, and at least as quickly as the video is intended to play out.
When considering video coding, particularly in real-time applications, one issue is the resolution of the video. The term resolution as used herein refers to the pixel resolution, i.e. the size of a frame or image in terms of number of pixels in two dimensions (as opposed to resolution in the sense of pixels per unit area). The pixel resolution is typically expressed in terms of a number of a number of pixels wide and high, i.e. number of columns and rows, e.g. 1280×720 (720p) or 640×480 (VGA). A lower resolution frame will be perceived as worse quality by the receiving user. On the other hand, a higher resolution frame incurs a higher bitrate in the encoded bitstream (and therefore more bandwidth). It also incurs more processing resource to encode (e.g. more processor cycles and/or memory resources), and more processing resource to decode. This means that sending a higher resolution frame than the transmitting terminal, channel or receiving terminal can handle in real-time is liable to result in other issues such as delay or packet loss.
The resolution at which a frame is encoded is an intrinsic property of the encoder. In order to accommodate for factors such as the capacity of the network or processing power of a user terminal, conventional codecs such as those based on the H.264 and HEVC standards allow the encoder to be set to operate at one a plurality of different discrete resolutions, e.g. a 1280×720 (720p) or 640×480 (VGA). The resolution is signalled to the decoder as side information in the bitstream so that the frame can be decoded accordingly.