The use of video continues to grow across consumer, enterprise, and public safety markets. Increasingly, video is used in a one-to-one video-on-demand model, where each viewer can watch the same or a different video stream at differing points in time. Moreover, in this context, viewers desire non-linear decoding of an encoded video stream, wherein non-linear decoding means starting the decoding process at an arbitrary frame within a video sequence. Non-linear decoding enables functions such as, but not limited to, rewind (RW), forward-(FW), pause (PAUSE), etc.; this class of functions is generally known in the industry as “trick play” functionality.
The method in which source video is encoded affects implementation of non-linear decoding. Modern video codecs employ two basic techniques for encoding source video, spatial image compression and temporal motion compensation. In either case, the source video is first divided into a sequence of frames each having a mesh of macroblocks. When all of the macroblocks within a frame are encoded using spatial image compression techniques, the frame is called an Intra or “I” frame, wherein the decoding of the frame does not depend upon the successful decoding of one or more previous frames. When some or all of the macroblocks within a frame are encoded using temporal motion compensation techniques, the frame is called a Predictive, Inter, or “P” frame, wherein the decoding of the frame depends upon the successful decode of one or more previous frames.
Modern video codecs achieve their incredible compression ratios largely through predictive encoding. However, to limit error propagation and to support non-linear, random access, Intra frames are injected into the video stream at regular intervals (e.g. every 1 or 2 seconds). This sequence of one Intra frame followed by a succession of Predictive frames is called a Group of Pictures, or GOP. Because of the predictive nature of video encoding, decoding traditionally must start at an Intra frame, or GOP boundary. As such, trick play functionality (e.g. rewind 5 seconds) must quantize the requested playback time to the nearest Intra frame, or start of a GOP. To increase the granularity at which trick play commands can function (e.g. rewind 1 second), one must generally decrease the GOP length, thus injecting more Intra frames more often.
Although helpful to trick play functionality, increasing the Intra frame insertion rate can be problematic in other ways. From an error resilience perspective, a poorly timed wireless fade (e.g. consecutive packet loss) could wipe out an entire Intra frame, causing existing prediction errors (resulting from previous packet loss) to continue to propagate forward in time. From a rate control perspective, Intra frames are “costly” with respect to Predictive frames; because all of the macroblocks in an Intra frame are encoded exclusively using spatial image compression techniques, the size of an Intra frame tends to be significantly larger than neighboring Predictive frames. This can produce an instantaneous spike in the average bit rate output of the encoder, or force the encoder to significantly degrade spatial or temporal quality of the subsequent Predictive frames to compensate in attempt to maintain an average output bit rate. Finally, given that modern video codecs achieve their significant compression ratios by exploiting spatial redundancy across time, an increased rate of Intra frames (which do not exploit such redundancies) will decrease overall compression efficiency.
Alternatively, an Intra macroblock refresh process can be employed to encode source video. An Intra macroblock refresh process is defined as an encoding process that avoids periodic insertion of Intra frames, and, rather, forces a set of macroblocks to be encoded using spatial image compression techniques across a set of Predictive frames such that within what is termed herein as an “Intra-refresh period” all of the marcoblocks within any given frame are Intra encoded at least once. In other words, some number of the macroblocks that comprise a given Predictive frame are forcibly Intra (e.g. no dependencies) encoded, regardless of whether or not their spatial content has changed radically from the previous frame. However, the problem with an Intra macroblock refresh process is that there are generally no frames in the stream in which all of the macroblocks are simultaneously refreshed. This is problematic for trick play and other forms of non-linear access, as there are no frames at which decoding can immediately commence.
Thus, there exists a need for a method and apparatus for performing trick play functionality (or non-linear decoding) of at least one video frame from a video stream encoded using an Intra macroblock refresh process.
Skilled artisans will appreciate that elements in the figures are illustrated for simplicity and clarity and have not necessarily been drawn to scale. For example, the dimensions of some of the elements in the figures may be exaggerated relative to other elements to help improve understanding of various embodiments. In addition, the description and drawings do not necessarily require the order illustrated. It will be further appreciated that certain actions and/or steps may be described or depicted in a particular order of occurrence while those skilled in the art will understand that such specificity with respect to sequence is not actually required. Apparatus and method components have been represented where appropriate by conventional symbols in the drawings, showing only those specific details that are pertinent to understanding the various embodiments so as not to obscure the disclosure with details that will be readily apparent to those of ordinary skill in the art having the benefit of the description herein. Thus, it will be appreciated that for simplicity and clarity of illustration, common and well-understood elements that are useful or necessary in a commercially feasible embodiment may not be depicted in order to facilitate a less obstructed view of these various embodiments.