The present invention relates generally to systems and methods for transmitting data. More specifically, the present invention relates to systems and methods for transmitting compressed digital video data in a manner that allows downstream network devices to splice multiple compressed video bit streams using network packet level manipulation.
There are presently a variety of different communication channels for transmitting or transporting video data. For example, communication channels such as coaxial cable distribution networks, digital subscriber loop (DSL) access networks, ATM networks, satellite, or wireless digital transmission facilities are all well known. In fact, many standards have been developed for transmitting data on the communication channels. The present invention relates to data transmission on such communication channels, and for the purposes of the present application a channel is defined broadly as a connection facility to convey digital information from one point to another. A channel includes some or all of the following elements: 1) physical devices that generate and receive the signals (including a modulator/demodulator); 2) a medium that carries the actual signals; 3) mathematical schemes used to encode and decode the signals; 4) proper communication protocols used to establish, maintain and manage the connection created by the channel 5) storage systems used to store the signals such as magnetic tapes and optical disks. The concept of a channel includes but is not limited to a physical channel, but also logical connections established on top of different network protocols, such as XDSL, ATM, IP, wireless, HFC, coaxial cable, Ethernet, Token Ring, etc.
The channel is used to transport a bit stream, or a continuous sequence of binary bits used to digitally represent video, audio or data. A bit rate is the number of bits per second used to transport the bit stream. Compression of video data is an approach that has been used to make digital video images more transportable. Digital video compression schemes allow digitized video frames to be represented digitally in much more efficient manner. Compression of digital video makes it practical to transmit the compressed signal using digital channels at a fraction of the bit rate required to transmit the original signal without compression.
In addition to the video compression scheme, a channel may also implement its own network protocol. For example, two common network protocols currently used are Internet Protocol (IP) and Asynchronous Transfer Mode (ATM). Both network protocols implement different rules which can be used to transport data or multimedia bit streams. For example, ATM specifies how data is first packetized into fixed sized data units, called cells. It also specifies how such a cell stream can be multiplexed, de-multiplexed, switched and routed between different locations to support end-to-end connections at a given bit rate.
Existing communication channels are frequently used for delivery of video. In some multimedia delivery applications, compressed video programs are delivered simultaneously to numerous digital receiver/decoders. In such multi-cast situations, the same bit stream may spread from a source to numerous receivers simultaneously via multiple channel paths. The original bit stream is often altered on each of these paths. For example, local advertisers may intercept a video program (e.g., a CNN feed or a nationally broadcast sporting event) and insert local content comprising local advertising.
Digital stream insertion (also called digital program insertion (DPI), digital spot insertion, etc.) is a process that replaces part of a digital bit stream by another bit stream. For the above multi-cast situation, both bit streams are typically encoded off-line in a different location or at a different time. As a result of the insertion however, the resulting bit stream has the advertisement inserted into the network feed.
The underlying technique for DPI is bit stream splicing (also known as bit stream concatenation), where a transition is made from an old stream to a new stream. The transition is called splice point. Creating splice points in a bit streams comprising compressed data often requires decoding and re-encoding of one or both compressed bit streams in order to allow seamless output of the video data from a decoder. For example, in order to complete a splicing process for an MPEG coded video data, an anchor frame with full picture data, i.e., an I frame, should be present at splicing points so that the switch from local content back to the broadcast content maintains compressed video data syntax integrity. For switching from the broadcast content into local content, the last coded picture in the broadcast bitstream has no reference into the future (sometimes called closed GOP condition) to avoid referencing an unrelated local content picture. However, compressed video programs transmitted in a broadcast typically do not always have anchor frames placed in splicing locations for digital ad segments.
Some current mechanisms for dealing with this problem include skipping frames, inserting black frames, and forcing overflow and underflow of video buffers. These mechanisms may all diminish output video quality. An alternative way to solve this problem is to create closed GOP and I frame for all potential local content insertions, i.e., closed GOP condition in the broadcast bitstream is formed at the point of local content insertion, and the first coded picture in the broadcast bitstream after the insertion is an I frame. This requirement implies that the broadcast encoder knows exactly where local content insertion will occur. This generally is not true because originally encoded content typically is sent to many geographic locations, at different times of the day. If a subsequent insertion point is not where the encoder assumed it to be, insertion can not be made. Making more frequent splicing points in the bitstream would make it more likely that the subsequent insertion can be done, but it comes at the cost of decreased bandwidth efficiency since I frames consumes considerably more bits than P and B frames.
The third approach to the above problem is not to encode splicing points at the encoder, and let the downstream splicer re-encode the content to satisfy the splicing requirement. This is also computationally expensive method since each splicer must re-encode the network bit stream and generate I frames. This decoding and re-encoding may be redundantly performed for each content insertion point downstream in the multi-cast. In addition, the network transmission protocol must also be overcome to access the compressed video data, e.g., the compressed video data must be removed from the network packets, decoded and re-encoded, and then repacketized into the network packets. The layered structure of network packets and protocols typically implies no data alignment between the video data payload and the network packets that contain it. Therefore, additional packet decapsulation and encapsulation must be performed to re-format the data for network transport. Each of these steps introduces complexity and transmission inefficiency to the multicast and increases the difficulty in providing seamless and efficient multi-cast applications.
Further, the compressed video bit streams are usually generated by either real-time encoders or pre-compressed video server storage systems. Both are likely to be in remote sites, each away from the network itself. This increases the difficulty in encoding the video signal to allow downstream digital stream insertion. As a result, the prior art systems assume that the splicing and compression steps are performed within the same equipment.
Therefore, there is a need for improved methods and systems for providing video data from multiple sources.