There are known networked distribution systems for transmitting television programming whereby audio visual material is transmitted to a number of affiliated stations, each of which retransmits the programming to viewers' homes. FIG. 1 illustrates a known system 10 for distributing analog video data, e.g., NTSC television signals. Within the known system, it is common practice for the local affiliate station(s) 12, 14, 16 to electronically overlay a station logo (or other identification) or other local informational content over the network video signal, e.g., a corner portion of the broadcast images. In this manner, the video pictures that are presented to viewers, e.g., homes A-Z, include picture content that has been locally inserted. In the case of analog television (e.g. NTSC) achieving the overlay of local content on the network signal can be easily accomplished by electronically adding or switching video signals.
It appears that there is a significant desire to perform the local insertion of picture content within emerging digital television networks. In these digital networks, highly compressed digital video will be delivered to viewers' homes. MPEG video coding will be used to accomplish this high degree of compression in many proposed digital television systems.
In order to obtain high compression efficiency, video compression techniques (e.g. MPEG) employ motion compensated prediction, whereby a region of pixels in the picture being compressed is coded by coding the difference between the desired region and a region of pixels in a previously transmitted reference frame. The term “motion compensated” refers to the fact that the position of the region of pixels within the reference frame can be spatially translated from the position of the region being coded, in order to compensate for motion, e.g., of the camera or in the scene, that occurred between the time that the two pictures were captured.
FIG. 2 illustrates the use of motion compensated prediction, in the form of motion vectors, to code a second image, Image 2 as a function of pixel values included in a reference frame, Image 1. In FIG. 2, Image 1 and Image 2 both comprise a plurality of segments including Segments A, B, and C. Using the known motion compensated prediction techniques, a segment of Image 2 may be coded using a motion vector which refers to, e.g., a block of pixel values corresponding to a segment of Image 1 that is located outside the image segment that is being coded. By transmitting motion vectors as opposed to the actual pixel values, efficient motion compensated prediction coding of Image 2 is achieved.
Note that in the FIG. 2 prior art example Segment A of Image 2 is represented using motion vectors 23, 25 which reference Segments B and C of Image 1. Motion vector 23 is an Image 2 motion vector which defines a portion of Image 2, Segment A as a function of a portion of (Image 1, Segment C). Motion Vector 25 is an Image 2 motion vector which defines a portion of (Image 2, Segment A) as a function of a portion of (Image 1, Segment B).
As discussed above, it is sometimes desirable to insert local image data into a previously encoded image. For example, a local broadcaster may want to insert a logo into Segment B of the image which is to be broadcast.
The use of motion compensated prediction makes it difficult for a local broadcaster to insert data into an encoded image by merely replacing encoded data blocks without running the risk of introducing errors into other frames which may rely on the image being modified as a reference frame. The difficulty of inserting new encoded image data in the place of previous encoded image data arises from the fact that the original coded blocks of subsequent coded images that are not part of the subset being replaced “assume” that the content of the replaced blocks is the original coded picture content. In such cases, any attempt to change the content of a subset of the blocks in the coded bitstream is likely to cause annoying prediction errors to propagate through the rest of the video, where blocks outside of the replaced subset were coded based on motion compensated predictions using picture content within the subset.
For example, if a logo was inserted into Segment B of Image 1 by substituting encoded data representing the logo for the encoded image data representing the original image content of Segment B of Image 1, a prediction error would result in Segment A of Image 2. Such a prediction error occurs because a portion of the logo as opposed to the original image content of Segment B of Image 1 will now be incorporated into Segment A of Image 2 by virtue of the use of the motion vector 25.
In the absence of techniques for selectively replacing coded blocks of pixels in video bitstreams compressed with motion compensated prediction, there are two alternative approaches:
1) Distribution of compressed video to affiliates (local stations) by forgoing the use of motion compensated prediction. This would decrease compression efficiency so greatly that this approach would be unacceptable for final transmission to viewers.
2) Encoding or decoding and then re-encoding of a series of complete video images at the point of local transmission. This approach removes motion vectors generated by the original encoding and then generates an entirely new set of motion vectors based on the images into which data has been inserted. This approach has the disadvantage of requiring the use of expensive video encoders at the local affiliate station capable of encoding a series of complete images. Also, this approach would generally require concatenated compression whereby video is first compressed for distribution to the affiliates, then fully decompressed and recompressed for final transmission, after local picture content is inserted into the unencoded images generated by fully decoding the originally encoded images. The application of concatenated compression generally results in picture degradation and when applied to complete images will, in many cases, result in final (home) picture quality which is unacceptable.
Accordingly, there is presently a need for cost effective methods and apparatus which will support: 1) the transmission of encoded digital video data, 2) the ability of local stations to insert sub-images and other local content into previously encoded video images; and 3) still provide an acceptable level of image quality to the final viewer of the encoded images, e.g., the home viewer of a video broadcast.