The process of splicing two analogue signals is relatively simple, as you simply find a vertical interval, and execute a switch between the two signals. This is a simple technique where the signals are synchronous and time-aligned.
For base-band digital signals, each frame is discrete, and so these signals can be readily spliced. However digital signals are not typically transmitted in base-band form, but instead encoded into a more efficient form, such as by using MPEG-2 or MPEG-4 which employ inter-frame coding.
MPEG (Moving Picture Experts Group) is a working group that sets standards for audio and video compression and transmission. Digital video compression is a process that, for example, removes redundancy in the digital video pictures. The redundancy between pictures in a video sequence can amount to a spatial redundancy and/or a temporal redundancy. MPEG coding, and particularly the more recent coding standards, starting with MPEG-2 compression, takes advantage of these redundancies by efficient coding. Accordingly the resulting representation is smaller in size that the original uncompressed pictures. MPEG encoding is highly statistical in nature, and lossy, as it essentially throws away content that is unlikely to be missed.
Digital stream insertion is essentially a process where a part of a primary digitally compressed stream is replaced by another secondary compressed stream. A particular application of this process is with programmes for transmission or broadcast, which have been compressed at a first location (e.g. by the programme maker) and then sent to a second location (e.g. a transmission facility for a local community). It may be desirable for those at the second location to insert information, such as advertisements, that are specific or relevant to their local community (i.e. locally targeted advertising or other regionally specific content). This is not a function that the programme distributor is typically willing to perform on another's behalf, particularly when they are distributing the programme around a multitude of different transmission facilities, each with their preferred local content for insertion.
Where the programme is being streamed in real-time, or substantially real time, to local transmission facilities, it would also be desirable for the local transmission facilities to be able to insert a secondary advertisement stream into the live network feed. Of course this is not a simple matter when that live network feed is compressed.
It is to be appreciated that the technique of “insertion” is equivalent to “splicing”. That is, it refers to the process whereby a transition is made from a primary stream to one or more secondary streams, and then, typically, back to the primary stream.
The simplest way to splice television programmes is in the baseband signal before compression occurs. This technique works well when the programme streams are received at the cable head-end in uncompressed form. However, when the programme is distributed in the form of an MPEG transport stream, to do so would require the stream to be fully decompressed and then recompressed with the inserted clips, which is a costly proposition, particularly in terms of quality, time and required processing power.
Where the signals or streams are compressed, the splicing process is complex, as not only are packets/frames in MPEG streams dependent upon adjacent packets in the stream, but MPEG coding schemes also utilise variable length encoding of digital video pictures. These factors all need to be considered when decoding MPEG streams.
More specifically, MPEG compression utilises a number of different frame/picture types, I-, P- and B-frames, which serve different purposes. These different frame types have different numbers of bytes and as a result, different transmission times. More particularly:
I-frames, or Intra-frames, can be fully decoded without reference to (and/or independently of) any other frames. That is, they are encoded using only information present in the picture itself;
P-frames, or Predicted-frames, are used to improve compression by exploiting the temporal redundancy in a scene. P-frames store only the difference in image from the frame immediately preceding them. The immediately preceding frame is therefore a point of reference; and
B-frames, or Bidirectional-frames, like P-frames are also used to improve compression, although this time by making predictions using both the previous and sequential frames (i.e. two anchor frames, namely I- and/or P-frames). Accordingly, in order to decode a B-frame, the decoder must process the previous frame and the sequential frame first, which means decoding B-frames requires large data buffers.
These frames are grouped into sequences, in MPEG coding. In MPEG-1 and MPEG-2 they are known as a “Group of Pictures” (GOP) whilst in MPEG-4/H.264 and HEVC/H.265 they are called a “Coded Video Sequence” (CVS). Henceforth, the term GOP will be used to describe such a sequence of frames of any of these formats, or similar formats. Such GOP sequences typically contain a combination of all of these frame types. Because of the dependency of P- and B-frames on anchor frames, it is not possible to cut one stream on a B-frame and enter the next on a P-frame because the anchor frames would no longer be correct.
The prior art addresses this problem by re-coding a section of a stream after an MPEG coded stream is received and processed by a decoder. For example, a sequence of B-frames in an incoming stream may be re-coded to I-frames or P-frames which are not dependent on subsequent frames. In this way, the re-coded digital stream would allow another digital stream to interrupt the original encoding interrelationship and at the same time permit frames around the splice point being decoded cleanly. However, this requires a decoder with additional processing units having large computational power which can be costly and complex.
A further problem in splicing two digitally encoded streams is resolving timing differences between the two streams. Since each stream is typically independent of each other, each stream would contain its own timing information which would be specific to the stream itself. Therefore, upon splicing the two streams, the timing information would become inaccurate (i.e. it would create a discontinuity in the time base).
There is therefore a need to overcome or improve at least one problem of the prior art.
In particular there is a need for an improved system and method for enabling insertion of video and/or audio clips into an MPEG transport stream.