Technical Field
This disclosure relates generally to distributed data processing systems and more particularly to frame rate conversion, sometimes referred to as temporal transcoding, of video content in distributed data processing environments.
Brief Description of the Related Art
Content providers (such as large-scale broadcasters, film distributors, and the like) generally want to distribute their content online in a manner that complements traditional mediums such as broadcast TV (including high definition television) and DVD. It is important to them to have the ability to distribute content to a wide variety of third-party client application/device formats, and to offer a quality viewing experience regardless of network conditions, using modern technologies like adaptive bitrate streaming. Notably, since Internet-based content delivery is no longer limited to fixed line environments such as the desktop, and more and more end users now use wireless devices to receive and view content, the ability to support new client device formats and new streaming technologies is particularly important.
A given content provider's content might represent single-media content (e.g., audio file) or the media file may include multiple media types, i.e., a container file with audio/video data. Generally speaking, a given container file is built on elemental data, potentially leveraging several different formats. For example, the audio and video data are each encoded using appropriate codecs, which are algorithms that encode and compress that data. Example codecs include H.264, VP6, AAC, MP3, etc. A container or package format functions as a wrapper and describes the data elements and metadata of various multimedia, so that a client application knows how to play it. Example container formats include FLV, Silverlight, MP4, PIFF, and MPEG-TS.
A given multimedia stream may reflect a variety of settings used to create the stream, e.g., bitrate encoding, formatting, packaging and other settings. Several versions of a given stream may be necessary for technologies like adaptive bitrate streaming, in order to allow a client or a server to switch between streams to compensate for network congestion or other issues.
Hence, to support the distribution of content to a wide variety of devices, content providers typically must create many different versions of their content. This can be done by transcoding content to change an encoding parameter or container format (the latter often being referred to as transmuxing or remultiplexing). The bit rate may also be changed, a process often referred to as transrating. This allows, for example, the creation of multiple copies of a given movie title at different screen sizes, bit rates, and client player formats.
The conversion of content can be achieved using a single machine, sometimes referred to as a linear approach, in which one machine processes the entire file. Alternatively content can be converted using a parallel approach, also referred to as a distributed approach, in which typically a given content file is broken into multiple segments or chunks, which are each processed by separate machines in a distributed computing architecture.
For example, U.S. Patent Publication No. 2013-0117418, titled “Hybrid Platform For Content Delivery and Transcoding”, the teachings of which are hereby incorporated by reference, discloses a system in which machines in a content delivery network (CDN) are configured to perform transcoding in parallel. The CDN described there is a distributing processing system and has many CDN machines. The CDN machines are designated as transcoding resources; a given content file is broken into segments (sometimes referred to as ‘streamlets’) which are distributed to various machines to be transcoded separately, along with instructions and parameters for the transcoding job. Once the machines return the transcoded content, the segments can be reassembled to create the transcoded content file. U.S. Patent Publication No. 2013-0114744, titled “Segmented Parallel Encoding With Frame-Aware, Variable-Size Chunking,” the teachings of which are hereby incorporated by reference, discloses techniques for breaking a given video into segments.
In addition to the conversion functions described previously, it is desirable to have a parallel architecture perform frame-rate conversion for a video, which is sometimes referred to as “temporal transcoding.” For example, it may be desired to either up-convert or down-convert the frame rate in a given stream.
However, frame rate conversion in a distributed architecture is challenging. The transcoding resources are operating in parallel, but some frame rate conversion decisions (such as when to remove or insert a frame) cross segment boundaries. Also, some videos have jitter in their timestamps, causing slight variations at segment boundaries. And, group-of-picture (GoP) structures in the video stream can vary widely and be quite complex.
FIGS. 1-2 illustrate some of the challenges present in a distributed transcoding approach. FIG. 1 shows a set of hypothetical input video segments that are 6 frames long (for a total of 24 frames) and are being down-converted to a total of 15 frames. The frames are denoted by the numbered vertical lines. The frames are shown in presentation time stamp (PTS) order. In a single-transcoder (linear) system, there would be no boundary conditions because a single transcoder processes all input and output frames.
Therefore, in a linear transcoding approach, frame-rate conversion would simply follow the input frame sequence to create an output sequence with uniform inter-frame spacing. However, in the distributed approach, each transcoding resource generally processes the segments independently; hence the boundary parameters such as t1 and t2 in FIG. 1 must be independently calculated to accurately maintain frame-time distances across the boundaries once all of the individual segments are multiplexed back together.
FIG. 2 further highlights the challenges of the distributed approach by illustrating the frame-rate conversion situation at each transcoding resource. FIG. 2 illustrates that each transcoding resource (TR) may need to produce a different number of output frames for its given segment. Further, FIG. 2 illustrates the need for each transcoding resource to employ the proper starting and ending offset times t1, t2, or otherwise the frame timing of the output segment might be negatively impacted.
FIG. 3 illustrates input frame timing jitter. The top stream is the example input sequence and the bottom is the output sequence with corrected timing. The ideal timing is indicated by the crossed lines 301 and some deviation of the frames from this ideal timing can be seen, for example, around frames 6 and 12.
The small amount of jitter in the input sequence of FIG. 3 may not be noticeable when the sequence is played back, and it may not be significant enough to impact audio-video synchronization. By the end of the sequence, all frames may again be aligned with the ideal frame times, as shown in FIG. 3. However, the small amount of jitter nevertheless might impact frame-rate conversion in a distributed transcoding system. For example, looking at frame 6 of the input, we can see that due to jitter problems, this frame has been pushed to segment 2 instead of being a part of segment 1. The actual duration of frame 5, however, will be the difference between frames 6 and 5, and hence, the overall length of segment 1 will be slightly larger than the ideal frame timing.
In a distributed approach, the transcoding resource processing segment 1 is not aware that frame 6 should have actually been a part of segment 1, and instead may proceed to perform an additional frame duplication of input frame 5 to create output frame 6, which technically fits within its time boundary. Hence, now there may be two frames representing output frame 6—one from a duplication of frame 5, and second from the actual input frame 6 in segment 2.
In light of these kinds of issues, there is a need to provide systems, apparatus, and methods for frame-rate conversion in distributed transcoding architectures. The teachings herein address these needs and offer other features and benefits that will become apparent in view of this disclosure.