As is well-known in the art, MPEG (i.e., MPEG-1, MPEG-2, MPEG-4, H.264) compressed video and audio streams are mapped into MPEG-2 Transport Streams as Elementary Streams (ES) packed into Packetized Elementary Stream (PES) packets, which, in turn, are packed in MPEG-2 Transport Stream (TS) packets. The PES packets contain a PES header which contains, among other things, a Presentation Time Stamp (PTS) and optionally also a Decoding Time Stamp (DTS) (in case the DTS is not present, it is considered equal to the PTS). The DTS tells the decoder when to decode a video/audio frame, while the PTS tells the decoder when to display (i.e., present) the video/audio frame. Both the DTS and PTS values are actually time events that are relative to a time reference that is also transmitted in the MPEG-2 Transport Stream. This time reference is called the System Time Clock (STC) and is coded in the TS as samples of a 27 MHz counter which are called the Program Clock Reference (PCR) fields.
Traditional solutions for splicing of video and audio rely on the fact that the audio and video content is received in the clear, i.e., is not encrypted. The reason for this is threefold.
First, the splicer must find a valid exit point in the head stream and must also find a valid entry point in the tail stream. In order to do this, it must interpret the frame information which must be in the clear.
Second, all MPEG compression standards (MPEG-1, MPEG-2, MPEG-4) contain a decoder buffer model with which the bitstreams coming out of an encoder must comply. When two segments of an MPEG compressed video stream, both of which are compliant with the MPEG decoder buffer model, are “glued” together, then in general the resulting MPEG stream will not comply with the MPEG decoder buffer model. In order to solve this problem, traditional MPEG splicing solutions are transrater based, which means that they modify the size of the video frames around the splice points in order to generate a valid video stream. In order to do this, the splicer needs to “dig deep” into the frame information and modify it, which requires that this information be available in the clear.
A third reason why traditional splicing solutions rely on unencrypted content has to do with the way audio is handled at a splice point. Audio frames do not have the same duration as video frames. As a consequence, splice points in video and audio do not coincide exactly. Therefore, audio is spliced at an audio frame boundary near the video splice point. After a splice, the audio is shifted slightly in time with respect to video, because there is no audio gap in the spliced output. This shift can lead to noticeable lip sync problems, especially due to accumulation of the shift after a number of splices. A splicer can compensate for the previously accumulated shift by taking an alternative audio frame boundary as splice point. Also because of the different frame duration, the number of audio frames to be replaced by, e.g., an advertisement is not fixed, requiring flexibility in the choice of audio frame for splicing.
Audio frames are packed in PES packets just like video. It is common practice that a number of audio frames are packed together in one PES packet because of bandwidth efficiency. Therefore, the ideal audio splice point can be in the middle of a given PES packet. If the audio content is not in the clear, it is impossible to splice at the ideal audio splice point, since this involves de-packing the audio frames and re-packing some of them in a new PES packet.
Valid splice points are traditionally signaled in the MPEG-2 Transport Stream by means of STCE-35 cue messages. These messages contain the PTS value of the video frame that corresponds with the intended splice point. The PTS value in the SCTE-35 message tells the splicer when to splice from the head stream to the tail stream. Optionally, the SCTE-35 cue message can also contain a break_duration field that tells the splicer after how much time it must splice back to the head stream.
In light of the foregoing, prior art MPEG splicing techniques require PES packets that are in the clear. However, there are many instances when splicing is desired, but the PES packets have already been encrypted, and thus the video and audio streams are not available in the clear. There is thus a need to provide systems and methodologies that enable splicing of MPEG streams even when those streams are not in the clear.