The invention is related to the field of digital multimedia transmissions and especially to MPEG-2 bit streams.
One of the most common operations in TV is switching from one program to another. At the studio, cameras and microphones are switched and mixed to form a program. At the broadcaster (whether broadcast by cable or airwaves), programs are regularly switched to commercials and to other programs. Finally, the viewer is given a choice of several program channels and often switches between the channels, especially between programs.
Currently the switching of analog signals at the studio and at the broadcaster, occurs during vertical intervals. In order to form a picture on a TV screen, first the odd lines of the picture are drawn by an electron gun, from the upper left, across each line, to the lower right side. Then during a vertical interval, the aim of the electron gun is moved from the lower right back to the upper left corner. Then, in a similar manor the electron gun draws the even lines of the picture interlaced with the odd lines. An independent unit of video such as all the even lines (or all the odd lines) is usually referred to as a xe2x80x9cframexe2x80x9d.
Currently, play-to-air (PTA) switchers are used to switch analog TV signals. Such switchers include synchronizing circuits, so that when a switch command is received, the PTA switcher waits until the next vertical interval between and then switches. When the program is switched during the vertical interval, there are no resulting flickers or flashes or other anomalies in the picture display during switching. This is known as seamless switching.
In a typical implementation of a PTA switcher, there are two output channels: a program channel and a preview channel. The program channel carries the material that is being broadcast (xe2x80x9cairedxe2x80x9d), whereas the preview channel is used for viewing only within the studio and it usually carries the program to be switched to next (i.e., the next program to be aired and transmitted over the program channel). The focus herein is on the output stream carried over the program channel since this is the stream that is received by the viewers and has to be displayed seamlessly. Therefore, and unless specified differently, output stream refers to the stream output over the program channel.
Many broadcasters are considering adding digital channels to their broadcasts. In the world, colors, brightness, sounds have practically infinite variations. (i.e. they are analog) For digital broadcasting, analog scenes and sounds, usually must be converted into digital representations in a process known as digitalizing or analog-to-digital (A/D) conversion. Due to the high bandwidth required for uncompressed digital video signals, it is expected that the video signals will require compression even in the production studio. For example, a single channel of uncompressed standard definition, digital video requires transmission of about 250 Mbs (million bits per second) of information (high definition video requires 1.5 Gbs). In digital video, pictures may not be interlaced and the term xe2x80x9cvideo framexe2x80x9d is used to refer to a complete picture.
The digital compression/decompression system can be conceived as: multiple encoders, each of which convert an uncompressed digital signal stream to a compressed stream; a switcher or splicer which switches between the input stream from each encoder to an output stream; and a decoder which decompresses the output stream from the splicer.
The standard for handling digital multimedia data is known as MPEG-2. In MPEG-2, the digital representations of elements (e.g. video, 2-4 audio channels, captions) of a program are compressed (encoded) in a lossy manner (i.e. some information is lost) and the encoded information is transmitted as a continuous stream of bits. At the end of transport, the encoded information is decompressed (decoded) to approximately reproduce the original digitalization of the elements, and the decoded elements are displayed to the viewer.
MPEG-2 streams are organized hierarchically. First, the digital representations for each element are encoded (compressed) into a bitstream known as an elementary stream (ES). Then headers are inserted into each ES to form a packetized elementary stream (PES). The header of each PES contain a decode timestamp (DTS) which specifies when the decoding of the following ES is to be completed, and a presentation timestamp (PTS) which specifies when the decoded information for the following ES is to be presented. For example, a PES header will be inserted before each picture of a video elementary stream and before each frame of an audio elementary stream. Each PES stream is encapsulated (packaged) into a series of transport packets each of which are 188 bytes long and include a header and payload such as the bits of a PES stream. A typical PES stream such as a picture, requires a large number of packets. The header of each packet includes flags, a countdown field, and a 13 bit packet identifier (PID) field which identifies the portion of the PES that the packet is for. For example, all the packets for an MPEG group of pictures may have the same PID. All the packets with the same PID are called a PID stream.
There are several auxiliary PID streams for each program, one of the streams is the program clock reference (PCR) which contains samples of a 27 MHz clock used by the video and audio encoders and decoders. The PID that carries the PCR is called the PCR_PID. Another auxiliary PID stream for each program, contains a program map table (PMT) which lists all the PID""s which belong to the program and defines which PID streams contain which elements (video, audio channels, captions, PCR_PID). All the PID streams for a program are multiplexed together (the packets are intermixed, but bits of different packets are not intermixed) so that, for example, the packets for pictures and the packets for audio frames are mixed together.
An MPEG-2 bit stream may include multiple programs. For example, the stream in a cable TV system may include hundreds of programs. The packets for different programs are also multiplexed together so that the decoder has to select the packets of a program in order to decode a particular program. Thus, another auxiliary PID stream is provided containing a program association table (PAT) which lists the PID streams containing the PMT""s for each of the programs. The packets of the PAT stream are all identified by a PID=0.
The packets for each program in a multi-program stream may be referred to as a stream or sub-stream. Similarly, the packets for each element or component of a program may be referred to as a stream or substream. Those skilled in the art are accustomed to this terminology.
FIG. 1 schematically illustrates a stream of packets with a packet identifier in the header and video, audio, PCR or PMT data in the payloads. Each packet is actually a continuous stream of bits representing one formatted block as shown. The packets containing data for a first video picture V1 are mixed with packets containing data for a first audio frame A1 and packets containing data for a second audio frame A2 as well as with packets containing PCR times and packets containing PMT information. Note that packets for different video frames in the same program are not mixed and packets for different audio frames in the same program are not mixed. However, for multi-program streams, the packets for a picture of one program would be mixed with packets for pictures of another program. Also, note that the bits of different packets are not mixed, that is, the stream transmits all the bits for one packet sequentially together then all the bits for the next packet sequentially together.
FIG. 2 schematically illustrates the same streams as FIG. 1 in a different way, by showing a separate bar for each component (element) of the program with vertical lines between PES streams for each picture or audio frame. The separate packets are not shown. In FIG. 2, the intermixing of packets for audio frames 1 and 2 with video picture 1 is illustrated by overlapping the PES stream for picture 1 with the PES streams for audio frames 1 and 2.
In the MPEG-2 standard, switching between programs is referred to as splicing, and points where splicing may take place without causing anomalies are referred to as seamless splice points. In MPEG-2, a new program is spliced onto an old program in the output stream when you switch from an old program to a new program. In the header of the packet in the same PES stream most immediately before a splice point, the MPEG-2 standard specifies that the splice point may be indicated, by setting the splicing_point_flag=1, setting the splice_coutdown=0, and if the splice is a seamless splice point, that may also be indicated by setting the seamless_splice_flag=1.
In MPEG-2 video compression, each picture is first compressed in a manner similar to JPEG (quantized cosine intraframe compression), and then sequentially presented pictures are compressed together (quantized cosine interframe compression). Essentially in interframe compression, only the differences between a picture and pictures it depends on are included in the compressed frame. The decoding of a picture may depend on the decoding of previously viewed pictures and in some cases on the decoding of subsequently viewed pictures. In order to minimize decoding problems, especially errors that may be propagate from an erroneous decoding of one picture to cause the erroneous decoding of dependent pictures, only a relatively small group of pictures (GOP) are compressed together (e.g. 9 pictures). The pictures of each GOP are encoded together independently from the pictures of any preceding GOPs and can thus be independently decoded (except for trailing B-frames) and any errors can not propagate from group to group. The first picture in a GOP (in order of presentation) is known as an I-frame and it is essentially just a JPEG encoded (independently compressed) picture and its decoding can be preformed independently (i.e. its decoding does not depend on any other picture). Some of the subsequent pictures in the group may be so called P-frames (prediction encoded frames) and their decoding depends on the previous I-frame and any previous P-frames in the GOP. That is, each P-frame only contains the differences between that picture and the previously decoded I or P-frame and the differences are compressed. Typically in broadcast streams, most of the pictures in a GOP are so called B-frames (bidirectionally encoded frames) and their decoding depends on both the immediately preceding I or P-frame and the immediately succeeding I or P-frame (in order of presentation). B-frames are typically, much smaller than P-frames which are typically, much smaller than I-frames. The size of particular encoded frames in MPEG-2 varies depending on the complexity of the picture and on the amount of difference between the picture and the picture or pictures on which its decoding depends.
A typical scheme proposed for broadcasting MPEG-2 is a group of 9 pictures presented sequentially on a display in the following order:
The decoding of P4 depends on I1 and the decoding of P7 depends on the decoding of P4 (which depends on the decoding of I1). The decoding of B2 and B3 depends on the decoding of I1 and P4. The decoding of B5 and B6 depends on the decoding of P4 and P7. The decoding of the last two B-frames (B8 and B9) depends on the decoding of P7 and on the immediately following I-frame (I10) in the following GOP (not shown).
In the data stream the encoded pictures are not transmitted or stored in presentation order. They are provided in the order that they are required for decoding. That is, the B-frames follow the I and P-frames on which they are dependent. The pictures in this typical scheme are provided in stream order, as follows:
Note that in stream order Bxe2x88x922 and Bxe2x88x921 of the preceding GOP and I10 of the succeeding GOP are mixed with the pictures of this typical GOP.
MPEG-2 defines a video buffer model for a decoder called the video buffering verifier (VBV). The VBV is a bitstream constraint, not a decoder specification. The actual decoder buffer will be designed so that any bitstream that does not overflow or underflow the VBV model, will not overflow or underflow the actual decoder buffer. The VBV model is a first-in-first-out (FIFO) buffer in which bits simultaneously exit the buffer in chunks of one picture at a time at regular intervals (e.g. every 33 milliseconds(ms)). The rate at which pictures exit the buffer is called the frame rate and the average decode time and it is the same as the frame rate.
When a decoder resets and starts to decode a new stream, the VBV buffer is initially empty. The VBV buffer is filled at a rate specified in the bit stream for either: a predetermined period of time for constant bit rate (CBR) mode; or until filled to a predetermined level for variable bit rate (VBR) mode. The time required to partially fill the VBV buffer prior to operation is called the startup delay. The startup delay must be carefully adhered to in order to prevent overflow or underflow of the VBV buffer during subsequent decoder operation.
When a bit stream terminates, the buffer continues to deliver pictures to the decoder until the buffer is emptied. The time required to empty the buffer after the stream ends is called the ending delay.
Those skilled in the art are directed to the following publications: (1) Table 3 xe2x80x9cCompression Format Constraintsxe2x80x9d of Annex A of Doc. A/53, ATSC Digital Television Standard; (2) ISO/IEC 13818-1, xe2x80x9cGeneric Coding of Moving Pictures and Associated Audio: Systemsxe2x80x9d; (3) Section 5.13 titled xe2x80x9cConcatenated Sequencesxe2x80x9d in Doc. A/54, xe2x80x9cGuide to the use of the ATSC Digital Television Standardxe2x80x9d, 4th Oct. 1995; (4) ISO/IEC 11172-3 International Standard, xe2x80x9cInformation Technology-Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to about 1.5 Mbit/sxe2x80x94Part 3: Audio, First editionxe2x80x9d, 1993-08-01; (5) ISO/IEC 13818-3 xe2x80x9cDraft International Standard, Information Technologyxe2x80x94Generic Coding of Moving Pictures and Associated Audio: Audio,xe2x80x9d ISO/IEC JTC1/SC29/WG11 N0703, May 10, 1994; (6) Proposed SMPTE Standard PT 20.02/10 xe2x80x9cSplice Points for MPEG-2 Transport Streams,xe2x80x9d Second Draft, July 1997.
It is an object of the invention to provide methods and apparatus for carrying out seamless video splicing and to avoid disturbing audio anomalies due to the related audio splicing of MPEG-2 bit streams that include video and audio components.
In the method of the invention for splicing MPEG-2 multimedia programs, in the same or different multimedia data streams, a first and second programs are provided. Each program includes a first media component of the same first media (e.g. video) and a second media component of the same second media (e.g. an audio channel) which is a different media than the first media. Each media component of each program has a multitude of splice-in points with respective begin-presentation times for respective first portions presented after the splice-in. Each media component also has a multitude of splice-out points with respective end-presentation times for a last portion presented before the splice-out. Such times (associated with splice-in and splice-out points) are relative to the starting time of the program. A command is received to splice the second program to the first program. Then the splicer selects a seamless splice-in point for the first component in the second program in the stream and selects a seamless splice-out point for the first component in the first program in the stream. The position of the slice-out point in the stream or streams in the splicer are approximately aligned with the position in the stream of the splice-in point of the first component of the second program. Then the splicer cuts the first component of the first program out at the selected splice-out point for the first component, and splices in the first component of the second program at the selected splice-in point for the first component. Then the presentation times in the second program are changed so that the first presented portion of the first component of the second program has a begin-presentation time which is the same as the end-presentation time of the last presented component of the first program. Then the splicer selects a splice-in point in the stream for the second component in the second program at which (treating the presentation times in the two programs as consistent) the begin-presentation time of the earliest presented portion of the second component of the second program (after the splice-in point for the second component in the stream) is equal or after the end-presentation time of the latest presented portion of the first component (before the splice-out point of the first component of the first program in the stream). The splicer also selects a splice-out point in the stream for the second component of the first program, at which the end-presentation time of the latest presented portion of the second component of the first program (before the splice-out point for the second component in the stream) is equal to or before both: the begin-presentation time of the earliest presented portion of the first component (after the splice-in point of the first component in the stream); and the begin-presentation time of the earliest presented portion of the second component in the second program (after the splice-in point of the second component in the stream). The splicer then splices the second component of the first program out at the selected splice-out point of the second component and splices the second component of the second program in at the selected splice-in point of the second component.
In one specific embodiment of the method of the invention, the begin-presentation time for the earliest presented portion of the second component of the second program (after the selected splice-in point for the second component in the stream) is equal to or after the begin-presentation time for the earliest presented portion of the first component of the second program (after the selected splice-in point of the first component in the stream). Also, the end-presentation time for the latest presented portion of the second component of the first program is equal to or before the begin-presentation time for the earliest presented portion of the second program (following the selected splice-in point for the second component in the stream).
In another specific embodiment of the method of the invention, the end-presentation time for the latest presented portion of the second component of the first program (before the splice-out point for the second component in the stream) is equal to or before the begin-presentation time for the earliest presented portion of the first component of the second program (after the selected splice-in point for the first component in the stream). Also, the begin-presentation time for the earliest presented portion of the second component of the second program (after the splice-in point for the second component in the stream) is equal to or later than the end-presentation time for the earliest presented portion of the second component in the first program (before the splice-out point for the second component in the stream).
In another specific embodiment of the method of the invention, the number of audio frames that must be skipped to prevent overflowing an audio decoding buffer is determined. Then a splice-out point for the second component in the first program that is previous to the splice-in point of the second component in the second program is selected depending on the determination in order to prevent overflowing the audio decoder buffer.
In the MPEG-2 data stream of the invention a first section of the stream consists essentially of a first media component of a first program and a second media component of the first program. A second section of the stream consists essentially of first media component of a second program and a second media component of the second program. A third section of the stream between the first section and the second section, consists essentially of the first media component of the second program and the second media component of the first program.
A multimedia encoder of the invention includes a processing unit; a memory communicating with the processing unit; one or more buffers in the memory; one or more network inputs communicating with the buffers in the memory, for receiving uncompressed programs; and at least one network output communicating with the buffers in the memory, for transmitting a data stream of one or more compressed programs from the encoder. The encoder also includes apparatus for receiving the uncompressed programs from the inputs into the buffers; apparatus for compressing the uncompressed portions of the programs in the buffers into compressed portions of the programs in the buffers; and apparatus for transmitting the compressed programs from the buffers onto the network output. The encoder also includes video splice-out providing apparatus for providing a multitude of seamless splice-out points in at least one of the compressed programs; and video splice-in providing apparatus for providing a multitude of seamless splice-in points in at least another one of the compressed programs. The encoder also has apparatus to prevent audio anomalies due to splicing the compressed programs.
A multimedia data stream splicer of the invention includes: a processing unit; a memory communicating with the processing unit; one or more buffers in the memory; one or more network inputs communicating with the buffers in the memory, for one or more input data streams including at least a first and second programs. Each program includes a first media component of the same first media (e.g. video) and a second media component of the same second media (e.g. audio) which is different than the first media. Each media component of each program has a multitude of splice-in points, each associated with a portion of the component having an earliest begin-presentation time after the splice-in; and a multitude of splice-out points, each associated with a portion of the component having the latest end-presentation time before the splice-out. The splicer further includes at least one network output for an output data stream with one or more programs, communicating with the buffers in the memory. The splicer also includes apparatus (programed computer memory) for receiving the programs from the input data streams into the buffers; apparatus for transmitting the programs from the buffers onto the network output as a data stream; and apparatus for receiving a splice command to splice the second program to the first program. The splicer also includes apparatus for selecting a splice-in point of the first component in the second program depending on the splice command; apparatus for selecting a splice-out point of the first component in the first program, at an equal or previous time with respect to the splice-in point of the first component in the second program; and apparatus for splicing the first component of the first program out at the selected splice-out point of the first component and splicing the first component of the second program in at the selected splice-in point of the first component. The splicer includes apparatus for changing the presentation times in the second program so that the first presented portion of the second program has a begin-presentation time which is the same as the end-presentation time of the last presented portion of the first program. The splicer also includes apparatus for selecting a splice-in point for the second component in the second program, at which the begin-presentation time of the earliest presented portion of the second component of the second program, is equal to or after the end-presentation time of the latest presented portion of the first component of the first program before the splice-out point for the first component. The splicer also includes apparatus for selecting a splice-out point for the second component in the first program, at which the end-presentation time for the latest presented portion of the second component in the first program before the splice-out point for the second component in the stream is equal to or before both: the begin-presentation time for the earliest presented portion of the first component of the second program after the splice-in point of the first component, in the stream; and the begin-presentation time for the earliest presented portion of the second component of the second program after the splice-in point in the stream of the second component. The splicer also includes apparatus for splicing the second component of the first program out at the selected splice-out point of the second component and splicing the second component of the second program in at the selected splice-in point of the second component.
A selective decoder of the invention includes a processing unit; memory communicating with the processing unit, including buffers in the memory; one or more network inputs communicating with the buffers in the memory, for one or more input data streams including at least a first and a second programs. Each data stream includes a first media component of the same first media and a second media component of the same second media which is different than the first media. Each media component of each program has a multitude of splice-in points and splice-out points associated with at least a relative begin-presentation time. The decoder further includes at least one output communicating with the memory, for transmitting uncompressed data of one or more programs from the memory. The decoder also includes apparatus for selecting fewer than all the programs available in the multimedia data stream; apparatus for receiving a selection of fewer than all the programs from the network input, including the first or the second program. The decoder includes apparatus for receiving portions of compressed programs from the input data streams into the decoder; apparatus for decoding portions of compressed data into uncompressed data; and apparatus for transmitting the uncompressed portions of programs from the decoder onto the output as an uncompressed digital data stream. The selective decoder also includes apparatus for receiving a change channel command to splice the second program to the first program; apparatus for selecting a splice-in point of the first component in the second program depending on the change channel command; apparatus for selecting a splice-out point of the first component in the first program, at an equal or previous begin-presentation time with respect to the splice-in point of the first component in the second program; and apparatus for splicing the first component of the first program out at the selected splice-out point of the first component and splicing the first component of the second program in at the selected splice-in point of the first component. The decoder also includes apparatus for selecting a splice-in point for the second component in the second program, at which the begin-presentation time of the earliest presented portion of the second component of the second program, is equal to or after the end-presentation time of the latest presented portion of the first component of the first program before the splice-out point for the first component. The decoder includes apparatus for selecting a splice-out point for the second component in the first program, at which both: the end-presentation time for the latest presented portion of the second component in the first program (before the splice-out point for the second component in the stream) is equal to or before both: the begin-presentation time for the earliest presented portion of the first component of the second program (after the splice-in point of the first component in the stream); and the begin-presentation time for the earliest presented portion of the second component of the second program (after the splice-in point of the second component in the stream). Finally, the decoder includes apparatus for splicing the second component of the first program out at the selected splice-out of the second component and splicing the second component of the second program in at the selected splice-in point of the second component.
Other alternatives and advantages of applicant""s inventions will be disclosed or become obvious to those skilled in the art by studying the detailed description below with reference to the following drawings which illustrate the elements of the appended claims of the inventions.