Video consumption is driving rapid growth of fixed and mobile network traffic. Being the dominant traffic type already today, video is expected to drive the overall network traffic to a multiple of today's volume and account for more than 70% of all network traffic within few years. The growth is primarily driven by streamed video on-demand (VoD) content, as consumers increasingly demand access to any content on any device at any time. VoD services are commonly operated on cloud-based video platforms, wherein all processing is executed in software running on generic servers, as such platforms can provide beneficial properties related to scalability, cost efficiency, and ubiquitous availability.
VoD content is typically delivered using adaptive bit rate (ABR) streaming techniques, where each video asset is made available in several different representations coded at different bit rates and/or quality levels so that video clients can choose representations according to bandwidth availability, device capabilities, etc.
FIGS. 1 to 3 illustrate three different VoD approaches generally denoted simulcast (FIG. 1), transcoding (FIG. 2) and guided transcoding (FIG. 3). In the simulcast approach, the original video sequence is encoded at different bit rates and/or quality levels, represented as high quality (HQ) and different low quality (LQ) versions in the figures, and the resulting HQ and LQ bit streams are stored. Accordingly, a bit stream of given bit rate and/or quality level can be retrieved from the storage and sent to the client upon request. Simulcast focuses on reducing the coding complexity since all desired bit rates and/or quality levels of the original video sequence are encoded and stored before the actual request. Once the request arrives, the server can just select the requested bit rate and/or quality level and transmit it without any further computations. The problem with the simulcast approach is, though, that it requires large storage capacity.
The transcoding approach shown in FIG. 2 tries to reduce the storage demands as much as possible. Accordingly, only the highest bit rate and/or quality level is encoded and stored. Once the request is received from the client and the request is for a LQ version of the video sequence, the server has to decode the HQ version, downsize it to the requested bit rate and/or quality level and encode the LQ version of the video. The transcoding approach thereby allows the server to save much of the storage capacity that the simulcast approach required but at the cost of increasing computational complexity. The high computational complexity is a main disadvantage of the transcoding approach.
Guided transcoding as shown in FIG. 3 is a compromise between the simulcast and transcoding approaches. This approach tries to reduce both the computational complexity of encoding the LQ versions of the video sequence on demand and the storage requirements of storing all HQ and LQ versions of the video sequence. The first part of guided transcoding is similar to simulcast. However, in clear contrast to the simulcast approach, not all data is stored for the LQ versions of the video sequence. In clear contrast, only so-called side information (SI) is stored for these LQ versions, while the actual picture data in terms of residual data and transform coefficients is removed from the bit streams. The SI contains inter motion information, intra mode information and details of how the pictures are dived into coding units (CUs), prediction units (PUs) and transform units (TUs), which are expensive and time consuming to calculate. However, as the actual picture data is not retained, the required storage space is much less as compared to the simulcast approach. Furthermore, by using the SI when receiving a request for a LQ version of the video, the actual encoding process is much faster since the data that is most expensive to generate is already present in the SI.
A variant of guided transcoding, denoted deflation, is presented in section 2.2 Deflation on pages 18-19 in [1]. In deflation, the intra mode information and inter motion information from a LQ bit stream is used to obtain a prediction, which is used together with a downsized reconstruction of the HQ bit stream to calculate a residual. The residual is frequency transformed and quantized and then subtracted from transform coefficients in the LQ bit stream. In this variant of guided transcoding, the SI also contains the difference between the transform coefficients and the transformed and quantized residual.
The deflation variant of guided transcoding can produce LQ versions of same quality as for direct encoding without transcoding, however at the cost of storing a larger amount of data as SI for the different LQ versions of the video sequence.
Thus, there is still a need for improvement within guided transcoding.