The way consumers watch videos has dramatically changed over the past decades. The methods for watching videos has evolved from traditional TV systems, to video streaming on desktops, laptops, and smart phones through Internet. Video streaming currently constitutes approximately 64% of all the U.S. Internet traffic. “Global Internet Phenomena Report”, Sandvine Intelligent Broadband Networks (2015). Cisco Systems, Inc. estimates that the streaming traffic will increase up to 80% of the whole Internet traffic by 2019. C.V.N. Index, “Forecast and methodology, 2014-2019” (2015).
A video stream, which is shown in FIG. 1, consists of several sequences. Each sequence is divided into multiple Group of Pictures (“GOP”), with the sequence header information at the front. A GOP is essentially a sequence of frames beginning with 1 (intra) frame, followed by a number of P (predicted) frames or B (be-directional predicted) frames. There are two types of GOP: open-GOP and closed-GOP. In close-GOP, there is no inter-relation among GOPs, hence, can be transcoded independently. In contrast, there is an inter-dependency between GOPs in open-GOP. Each frame of the GOP contains several slices that consist of a number of microblocks (MB) which is the basic operation unit for video encoding and decoding.
Video content, either in form of on-demand streaming (e.g., YouTube or Netflix) or live-streaming (e.g., Livestream), needs to be converted based on the characteristics of the client's devices. Video contents are initially captured with a particular format, spatial resolution, frame rate, and bit rate. Then the video is uploaded to streaming servers. Streaming servers usually adjust the original video based on the client's network bandwidth, device resolution, frame rate, and video codec. All these conversions and adjustments are generally referred to as video transcoding. I. Ahmad, X. Wei, Y. Sun, and Y. Q. Zhang, “Video transcoding: an overview of various techniques and research issues,” IEEE on Signal Processing Magazine, vol. 20, no. 2, pp. 18-29 (2003). The conversion is termed “video transcoding.”
In video transcoding, video streams can be split at different levels, namely sequence level, GOP level, frame level, slice level, and macroblock level. Sequence level contains several GOPs that can be transcoded independently. Video transcoding is a computationally heavy and time-consuming process. Due to the large size of each sequence, its transmissions and transcoding time is long. On the other hand, frames, slices, and macroblocks have temporal and spatial dependency, which makes their processing complicated and slow. F. Lao, X. Zhang, and Z. Guo, “Parallelizing video transcoding using map-reduce-based cloud computing”, Proceedings of IEEE International Symposium on Circuits and Systems, pp. 2905-08 (2012). In order to avoid unnecessary communication delay between the different cloud servers (i.e., virtual machine), video streams are commonly split into GOPs that can be transcoded independently. F. Jokhio, T. Deneke, S. Lafond, and J. Lilus, “Analysis of video segmentation for spatial resolution reduction video transcoding,” Proceedings of IEEE International Symposium on Intelligent Signal Processing and Communications Systems (ISPACS), pp. 1-6 (2011).
Bit rate adjustment is one kind of commonly performed transcoding operation. To produce high quality video contents, it has to be encoded with a high bit rate. However, high bit rate also means the video content needs large network bandwidth for transmission. Considering the diverse network environment of various clients, streaming service providers typically transcode the video stream's bit rate to ensure smooth streaming. O. Werner, “Requantization for transcoding of mpeg-2 intraframes”, IEEE Transactions on Image Processing, vol. 8, pp. 179-191 (1999).
Spatial Resolution Reduction is another commonly performed transcoding operation. Spatial resolution indicates the encoded dimensional size of a video. The dimensional size does not necessarily match to the screen size of the client's devices. To avoid losing content, macroblocks of an original video have to be removed or combined (i.e., downscaled) to produce lower spatial resolution video. There are several circumstances where the spatial resolution algorithms can be applied to reduce the spatial resolution without sacrificing quality. FIG. 2(a) shows the challenge in mapping four motion vectors (MV) to one. J. Xin, M. T. Sun, K. Chun, and B. S. Choi, “Motion re-estimation for hdtv to sdtv transcoding,” Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS), vol. 4, pp. 179-191 (1999). FIG. 2(b) shows the challenge in determining the type from several types. N. Bjork and C. Christopoulos, “Transcoder architectures for video coding,” IEEE Transactions on Consumer Electronics, vol. 44, no. 1, pp. 88-98 (1998).
Temporal Resolution Reduction is another commonly performed transcoding operation. This operation occurs when the client's device only supports lower frame rate, and ht stream server has to drop some frames. However, due to the dependency between frames, dropping frames may cause MVs to become invalid for the incoming frames. Temporal resolution reduction can be achieved using methods currently known in the art. See, e.g., S. Goel, Y. Ismail, and M. Bayoumi, “High-speed motion estimation architecture for real-time video transmission,” The Computer Journal, vol. 55, no. 1, pp. 35-46 (2012); Y. Ismail, J. B. McNeely, M. Shaaban, H. Mahmoud, M. Bayoumi et al, “Fast motion estimation system using dynamic models for h. 264/avc video coding,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 1, pp. 28-42 (2012).
Video Compression Standard Conversion is another commonly performed transcoding operation. Video compression standards vary from MPEG2 to H.264, and to the most recent one, HEVC. Video contents are encoded by various video compression standards. Therefore, video streams usually need to be transcoded to the supported codec on client devices. M. Shaaban and M. Bayoumi, “A low complexity inter mode decision for mpeg-2 to h. 264/avc video transcoding in mobile environments,” Proceedings of the 11th IEEE International Symposium on Multimedia (ISM), pp. 385-391 (2009); T. Shanableh, E. Peixoto, and E. Izquierdo, “Mpeg-2 to hevc video transcoding with content-based modeling,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 23, pp. 1191-1196 (2013).
Due to the limitations in processing power and energy sources (e.g., in smart phones), it is not practical to transcode videos on clients' devices. X. Li, M. A. Salehi, and M. Bayoumi, “Cloud-based video streaming for energy- and compute-limited thin clients”, Stream2015 Workshop, Indiana University (2015).
One approach to address the video transcoding problem is to store numerous transcoded versions of the same video to serve different types of client devices. However, this approach requires massive storage resources in addition to powerful processors. Provisioning and upgrading these infrastructures to meet the fast-growing demands of video transcoding is cost-prohibitive, specifically for small- and medium-size streaming service providers. Moreover, given the explosive growth of video streaming demands on a large diversity of the client devices, this approach remains unachievable.
The challenge in utilizing cloud resources for on-demand video transcoding, however, is how to employ them in a cost-efficient manner and without a major impact on the quality of service (QoS) demands of video streams. Video stream clients have unique QoS demands. In particular, they need to receive video streams without any delay. Such delay may occur either during streaming, due to an incomplete transcoding task by its presentation time (“missing presentation deadline”), or it may occur at the beginning of a video stream (“startup delay”). Previous studies confirm that streaming clients mostly do not watch videos to the end. See X. Cheng, J. Liu, and C. Dale, “Understanding the characteristics of internet short video sharing: a YouTube-based measurement study,” IEEE Transactions on Multimedia, vol. 15, no. 5, pp. 1184-1194 (2013). However, they rank the quality of a stream provider based on the video's startup delay. Therefore, to maximize clients' satisfaction, we consider video streaming QoS demand as: minimizing the startup delay without missing the presentation deadline.
Streaming service provider's goal is to spend the minimum for cloud resources, while meets the QoS requirements of video streams. Satisfying this goal becomes further complicated when we consider the variations exist in the demand rate of video streams. Thus, to minimize the cost of utilizing cloud resources, our system should adapt its service rate (i.e., transcoding rate) based on the clients' demand rate and with respect to the video streams' QoS requirements. As such, there is a need in the market for: (1) improvement of clients' QoS satisfaction by minimizing video streams startup delay and presentation deadline miss rate; and (2) creation of a dynamic cloud resource provisioning policy to minimize streaming service providers' incurred cost while the clients' QoS demands are respected. To meet these needs, herein disclosed is a Cloud-based Video Streaming Service architecture.