When video is streamed over the Internet and played back through a Web browser or media player, the video is delivered in digital form. Digital video is also used when video is delivered through many broadcast services, satellite services and cable television services. Real-time videoconferencing often uses digital video, and digital video is used during video capture with most smartphones, Web cameras and other video capture devices.
Digital video can consume an extremely high amount of bits. Engineers use compression (also called source coding or source encoding) to reduce the bitrate of digital video. Compression decreases the cost of storing and transmitting video information by converting the information into a lower bitrate form. Decompression (also called decoding) reconstructs a version of the original information from the compressed form. A “codec” is an encoder/decoder system.
Over the last two decades, various video codec standards have been adopted, including the H.261, H.262 (MPEG-2 or ISO/IEC 13818-2), H.263 and H.264 (AVC or ISO/IEC 14496-10) standards and the MPEG-1 (ISO/IEC 11172-2), MPEG-4 Visual (ISO/IEC 14496-2) and SMPTE 421M standards. In particular, decoding according to the H.264 standard is widely used in game consoles and media players to play back encoded video. H.264 decoding is also widely used in set-top boxes, personal computers, smart phones and other mobile computing devices for playback of encoded video streamed over the Internet or other networks. A video codec standard typically defines options for the syntax of an encoded video bitstream, detailing parameters in the bitstream when particular features are used in encoding and decoding. In many cases, a video codec standard also provides details about the decoding operations a decoder should perform to achieve correct results in decoding.
Several factors affect quality of video information, including spatial resolution, frame rate and distortion. Spatial resolution generally refers to the number of samples in a video image. Images with higher spatial resolution tend to look crisper than other images and contain more discernible details. Frame rate is a common term for temporal resolution for video. Video with higher frame rate tends to mimic the smooth motion of natural objects better than other video, and can similarly be considered to contain more detail in the temporal dimension. During encoding, an encoder can selectively introduce distortion to reduce bitrate, usually by quantizing video information during encoding. If an encoder introduces little distortion, the encoder maintains quality at the cost of higher bitrate. An encoder can introduce more distortion to reduce bitrate, but quality typically suffers. For these factors, the tradeoff for high quality is the higher cost of storing and transmitting the information in terms of bitrate.
When encoded video is delivered over the Internet to set-top boxes, mobile computing devices or personal computers, one video source can provide encoded video to multiple receiver devices. Or, in a videoconference, one device may deliver encoded video to multiple receiver devices. Different receiver devices may have different screen sizes or computational capabilities, with some devices able to decode and play back high quality video, and other devices only able to play back lower quality video. Also, different receiver devices may use network connections having different bandwidths, with some devices able to receive higher bitrate (higher quality) encoded video, and other devices only able to receive lower bitrate (lower quality) encoded video.
In such scenarios, with simulcast encoding and delivery, video is encoded in multiple different ways to provide versions of the video at different levels of distortion, temporal quality and/or spatial resolution quality. Each version of video is represented in a bitstream that can be decoded to reconstruct that version of the video, independent of decoding other versions of the video. A video source (or given receiver device) can select an appropriate version of video for delivery to the receiver device, considering available network bandwidth, screen size, computational capabilities, or another characteristic of the receiver device.
Scalable video coding (SVC) and decoding are another way to provide different versions of video at different levels of distortion, temporal quality and/or spatial resolution quality. With SVC, an encoder splits video into a base layer and one or more enhancement layers. The base layer alone provides a reconstruction of the video at a lower quality level (e.g., lower frame rate, lower spatial resolution and/or higher distortion). One or more enhancement layers can be reconstructed and added to reconstructed base layer video to increase video quality in terms of higher frame rate, higher spatial resolution and/or lower distortion. Scalability in terms of frame rate is an example of temporal scalability.
In some respects, SVC outperforms simulcast transmission because SVC exploits redundancy between different versions of the video. Usually, for a given level of quality, the combined bitrate of the base layer and enhancement layer(s) is slightly higher than the bitrate of an independently decodable simulcast version of the video. For all of the levels of quality, however, the collective bitrate of the base layer and enhancement layers is much lower than the collective bitrate of the different simulcast versions of the video. For this reason, SVC reduces uplink bandwidth utilization when video is uploaded from an encoder site to a delivery server on a network. Even for real-time communication to a single receiver device, SVC offers advantages in terms of error resilience, bitrate adaptability and scalable transmission.