The amount of video content delivered and consumed over a delivery network has dramatically increased over time. This increase is in part due to VOD (Video on Demand) services, but also to the increasing number of live services combined with the increasing number of devices capable of accessing a delivery network. By way of example only, video content can notably be accessed from various kinds of terminals, such as smart phones, tablets, PC, TV, Set Top Boxes, game consoles, and the like, which are connected through various types of delivery networks including broadcast, satellite, cellular, ADSL, and fibre.
Due to the large size of raw video, video content is generally accessed in compressed form. Consequently, video content is generally expressed using a video compression standard. The most widely used video standards belong to the “MPEG” (Motion Picture Experts Group) family, which notably comprise the MPEG-2, AVC (Advanced Video Compression also called H.264) and HEVC (High Efficiency Video Compression, also called H.265) standards. Generally speaking, more recent formats are considered to be more advanced, as newer formats support more encoding features and/or provide for better compression ratios. For example, the HEVC format is more recent and more advanced than AVC, which is itself more recent and more advanced than MPEG-2. Therefore, HEVC yields more encoding features and greater compression efficiency than AVC. The same applies for AVC in relation to MPEG-2. These compression standards are block-based compression standards, as are the Google formats VP8, VP9, and VP10.
Even within the same video compression standard, video content can be encoded using very different options. Video content can be encoded at different bitrates. Video content can also be encoded using only I frames (I Frame standing for Intra Frame), I and P Frames (P standing for Predicted Frame), or I, P and B frames (B standing for Bi-directional frames). Generally speaking, the number of available encoding options increases with the complexity of the video standard.
Conventional video coding methods use three types of frame: I or Intrapredicted frames, P or Predicted frames, and B or bi-directional frames. I frames can be decoded independently. P frames reference other frames that have been previously displayed, and B frames reference other frames that have been displayed or have yet to be displayed. The use of reference frames involves predicting image blocks as a combination of blocks in reference frames, and encoding only the difference between a block in the current frame and the combination of blocks from reference frames.
A GOP is generally defined as the Group of Pictures between one I frame and the next I frame in encoding/decoding order. Closed GOP refers to any block based encoding scheme where the information to decode a GOP is self-contained. In other words, a closed GOP contains one I frame, P frames that only reference the I frame and P frames within the GOP, and B frames that only reference frames within the GOP. Thus, in a closed GOP there is no need to obtain any reference frame from a prior GOP to decode the current GOP. In common decoder implementations, switching between resolutions at some point in a stream requires that a “closed GOP” encoding scheme is used, since the first GOP after a resolution change must not require any information from the previous GOP in order to be correctly decoded.
By contrast, in the coding scheme called open GOP, the first B frames in a current GOP which are displayed before the I frame can reference frames from prior GOPs. Open GOP coding schemes are widely used for broadcasting applications because this coding scheme provides a better video quality for a given bit rate.
Video delivery has continued to grow in popularity over a wide range of networks. Among the different networks on which video delivery may be performed, IP networks demand particular attention as video delivery represents a growing portion of the total capacity of IP networks.
FIG. 1 is an illustration of a common video distribution scenario according to the prior art. As shown in FIG. 1, a primary video stream 110 comprising programmed content is received at a regional television studio. As shown, the programmed content might comprise, for example, feature film material or scheduled sporting event coverage. Primary video stream 110 is encoded according to a block based encoding algorithm as discussed above. Meanwhile, the regional television studio generates its own video content stream 120, which might comprise news reporting on local topics relevant to the area in which the regional studio is situated. A splicer unit 130 combines primary video stream 110 and secondary video stream 120 to constitute a single combined video stream 140, which when decoded presents a continuous sequence of images reflecting the content of both of the original video streams. Similar scenarios occur in other contexts, such as in the case of television set-top boxes implementing a “channel in box” functionality, where a stream of local content (e.g. stored in memory in the set-top box) is combined with a stream received from an outside source.
FIG. 2 depicts additional details of the prior art scenario illustrated by FIG. 1. In the scenario of FIG. 1, primary video stream 110 is generally encoded in accordance with a block based encoding scheme as described above, and is represented schematically as a series of encoded blocks, whilst secondary video stream 110 is represented schematically as a succession of individual pictures. To represent the different content of the primary and secondary video streams, the schematic representations of the content of the secondary video stream are cross-hatched the schematic representations of the content of the primary video stream are plain.
Before primary video stream 110 can be combined with the material of secondary video stream 120, the primary video stream 110 is decoded at a decoder 211 to generate the decoded primary video stream 210. In many scenarios the secondary video stream 120 may be un-encoded digital video for example in Y′UV format such as ITU-R BT.656, and will not therefore necessarily need to be decoded, although this may still be necessary in other scenarios. In some cases it may be desirable to perform edition operations on the secondary video stream to add logos, station identifiers, or other graphical overlays to ensure a visual correspondence between images from the two streams, at editing unit 221. The decoded primary video stream 210 and edited secondary video stream 220 can then be directly combined by switching between the two video streams at the desired instant at switcher 130 to generate the combined video signal 140, which can then be re-encoded by an encoder 241 to generate an encoded, combined video stream 240. As shown, the encoded, combined video stream 240 comprises a series of encoded blocks, with the subject matter of the secondary video stream stretched across a number of blocks.
The continuous decoding of primary video signal 110 and re-encoding of the combined video signal 140 dictated by this approach calls for significant processing and storage capacity, and necessitates continuous power consumption. It furthermore introduces additional transmission latency. It is desired to avoid or mitigate these drawbacks.