Over-the-top (OTT) delivery of live or prerecorded media content to client devices such as set-top boxes, computers, smartphones, mobile devices, tablet computers, gaming consoles, and other devices over networks such as the internet has become increasingly popular. Delivery of such media content commonly relies on adaptive bitrate streaming technologies such as HTTP Live Streaming (HLS), Smooth Streaming, and MPEG-DASH.
Adaptive bitrate streaming allows content to be encoded at different bitrates, such that different versions encoded at different bitrates can be delivered to client devices depending on factors such as network conditions and the receiving client device's processing capacity. For example, when the network is congested, a version of the content encoded with a low bitrate can be streamed to a client device until network conditions improve, at which point a higher bitrate version can be streamed to the client device.
Media content encoded with adaptive bitrate streaming techniques is often divided into multiple segments. This can allow client devices to request or receive different segments of the media content at different quality levels depending on network conditions or other factors. It can also allow client devices to quickly move to different points within the media content by requesting specific segments. For instance, a user can request that playback of a movie begin at twenty minutes into the movie, and a client device can accordingly request a segment of the movie's encoded media content that begins closest to the twenty minute mark.
As described above, existing adaptive bitrate streaming solutions allow client devices to jump to desired points within the media content by requesting specific segments of the media content. However, existing adaptive bitrate streaming solutions are not well suited for other types of playback features that many users expect or desire. For example, in analogue video or film systems, playback of frames can be physically sped up or reversed to allow users to fast-forward through content or rewind the content. However, in the digital environment of adaptive bitrate streaming, individual frames would need to be delivered and decoded at a very high rate to imitate analogue fast-forwarding or rewinding. Doing so is generally not practical, as it can require significant bandwidth and/or can exceed the processing capabilities of the client device.
Instead of quickly decoding and displaying every frame to present a fast-forwarded or rewound version of media content, many adaptive bitrate streaming solutions have attempted to emulate these types and other types of playback with “trick mode” or “trick-play” functions. Trick-play methods process the frames of digitally encoded media content in various ways to allow fast forwarding, rewinding, pausing, seeking, random-access, frame stepping, and other functions.
However, existing methods of fast forwarding and rewinding digital content are choppy in comparison with the smoothness displayed in analogue fast forwarding and rewinding, due to the way digitally media content is generally encoded and compressed. In digital encoding of media content, each frame is generally encoded either with intra prediction or inter prediction. An intra frame, also referred to as an I-frame or key frame, is encoded independently of other frames using only data within the intra frame. In contrast, an inter frame is encoded with reference to one or more other frames, such as encoding the differences between the inter frame and the reference frame. P-frames are inter frames that are coded with reference to previous frames, while B-frames are inter frames that are coded with reference to both previous and subsequent frames. Because frames close together in media content are often very similar, and may only have minor differences such as variations in location of an object that moves between frames, data that has already been encoded or decoded for one frame can be reused or referenced when encoding or decoding another frame. The data needed to encode an inter frame, for instance data describing differences in the frame relative to another frame that has already been encoded, can often be significantly smaller than the data needed to encode an entire intra frame.
Many compression schemes, such as H.264 or MPEG-2, encode a relatively small number of frames as I-frames and encode the majority of frames as P-frames or B-frames. This approach can save significant space and/or bandwidth, because inter frames can be encoded with relatively small amounts of information compared to encoding a complete frame. These types of compression schemes generally work well for normal playback of media content. However, prior compression schemes and methods of encoding media content for adaptive bitrate streaming do not work well for implementing trick-play modes.
It is generally impractical to decode and display media content at an increased rate using normal adaptive bitrate streaming techniques to simulate fast-forwarding or rewinding. Most client devices cannot decode inter frames quickly enough to present smooth fast-forwarding or rewinding, because inter frames depend on other frames that also must be decoded. For this reason, most existing adaptive bitrate streaming implementations avoid decoding inter frames during trick-plays, and instead rely on exclusively decoding intra frames and skipping inter frames. However, intra frames often appear relatively infrequently and/or at irregular intervals within encoded media content. By exclusively decoding intra frames, these implementations often lead to choppy video with the appearance of an almost random selection of frames being presented to a viewer instead of a smoothly sped up video with frames being presented at consistent time intervals.
Existing implementations that exclusively rely on streaming intra frames to client devices are also inefficient. Because intra frames are not compressed based on data from other frames, they require more data to store and transmit than more heavily compressed inter frames. Exclusively streaming the larger intra frames during trick-plays can result in heavy bandwidth usage compared to normal playback that also includes smaller inter frames.
Additionally, although security and encryption is a large concern for many providers of media content, encryption techniques for trick-plays in adaptive bitrate streaming have yet to be defined or standardized.