The present invention relates to image processing and, more particularly, to methods and apparatus of encoding digital images to facilitate the subsequent insertion of additional image data into previously encoded images and to methods and apparatus for inserting said additional image data.
There are known networked distribution systems for transmitting television programming whereby audio visual material is transmitted to a number of affiliated stations, each of which retransmits the programming to viewers"" homes. FIG. 1 illustrates a known system 10 for distributing analog video data, e.g., NTSC television signals. Within the known system, it is common practice for the local affiliate station(s) 12, 14, 16 to electronically overlay a station logo (or other identification) or other local informational content over the network video signal, e.g., a corner portion of the broadcast images. In this manner, the video pictures that are presented to viewers, e.g., homes A-Z, include picture content that has been locally inserted. In the case of analog television (e.g. NTSC) achieving the overlay of local content on the network signal can be easily accomplished by electronically adding or switching video signals.
It appears that there is a significant desire to perform the local insertion of picture content within emerging digital television networks. In these digital networks, highly compressed digital video will be delivered to viewers"" homes. MPEG video coding will be used to accomplish this high degree of compression in many proposed digital television systems.
In order to obtain high compression efficiency, video compression techniques (e.g. MPEG) employ motion compensated prediction, whereby a region of pixels in the picture being compressed is coded by coding the difference between the desired region and a region of pixels in a previously transmitted reference frame. The term xe2x80x9cmotion compensatedxe2x80x9d refers to the fact that the position of the region of pixels within the reference frame can be spatially translated from the position of the region being coded, in order to compensate for motion, e.g., of the camera or in the scene, that occurred between the time that the two pictures were captured.
FIG. 2 illustrates the use of motion compensated prediction, in the form of motion vectors, to code a second image, Image 2 as a function of pixel values included in a reference frame, Image 1. In FIG. 2, Image 1 and Image 2 both comprise a plurality of segments including Segments A, B, and C. Using the known motion compensated prediction techniques, a segment of Image 2 may be coded using a motion vector which refers to, e.g., a block of pixel values corresponding to a segment of Image 1 that is located outside the image segment that is being coded. By transmitting motion vectors as opposed to the actual pixel values, efficient motion compensated prediction coding of Image 2 is achieved.
Note that in the FIG. 2 prior art example Segment A of Image 2 is represented using motion vectors 23, 25 which reference Segments B and C of Image 1. Motion vector 23 is an Image 2 motion vector which defines a portion of Image 2, Segment A as a function of a portion of (Image 1, Segment C). Motion Vector 25 is an Image 2 motion vector which defines a portion of (Image 2, Segment A) as a function of a portion of (Image 1, Segment B).
As discussed above, it is sometimes desirable to insert local image data into a previously encoded image. For example, a local broadcaster may want to insert a logo into Segment B of the image which is to be broadcast.
The use of motion compensated prediction makes it difficult for a local broadcaster to insert data into an encoded image by merely replacing encoded data blocks without running the risk of introducing errors into other frames which may rely on the image being modified as a reference frame. The difficulty of inserting new encoded image data in the place of previous encoded image data arises from the fact that the original coded blocks of subsequent coded images that are not part of the subset being replaced xe2x80x9cassumexe2x80x9d that the content of the replaced blocks is the original coded picture content. In such cases, any attempt to change the content of a subset of the blocks in the coded bitstream is likely to cause annoying prediction errors to propagate through the rest of the video, where blocks outside of the replaced subset were coded based on motion compensated predictions using picture content within the subset.
For example, if a logo was inserted into Segment B of Image 1 by substituting encoded data representing the logo for the encoded image data representing the original image content of Segment B of Image 1, a prediction error would result in Segment A of Image 2. Such a prediction error occurs because a portion of the logo as opposed to the original image content of Segment B of Image 1 will now be incorporated into Segment A of Image 2 by virtue of the use of the motion vector 25.
In the absence of techniques for selectively replacing coded blocks of pixels in video bitstreams compressed with motion compensated prediction, there are two alternative approaches:
1) Distribution of compressed video to affiliates (local stations) by forgoing the use of motion compensated prediction. This would decrease compression efficiency so greatly that this approach would be unacceptable for final transmission to viewers.
2) Encoding or decoding and then re-encoding of a series of complete video images at the point of local transmission. This approach removes motion vectors generated by the original encoding and then generates an entirely new set of motion vectors based on the images into which data has been inserted. This approach has the disadvantage of requiring the use of expensive video encoders at the local affiliate station capable of encoding a series of complete images. Also, this approach would generally require concatenated compression whereby video is first compressed for distribution to the affiliates, then fully decompressed and recompressed for final transmission, after local picture content is inserted into the unencoded images generated by fully decoding the originally encoded images. The application of concatenated compression generally results in picture degradation and when applied to complete images will, in many cases, result in final (home) picture quality which is unacceptable.
Accordingly, there is presently a need for cost effective methods and apparatus which will support: 1) the transmission of encoded digital video data, 2) the ability of local stations to insert sub-images and other local content into previously encoded video images; and 3) still provide an acceptable level of image quality to the final viewer of the encoded images, e.g., the home viewer of a video broadcast.
The present invention comprises methods which permit bandwidth efficient video compression for distribution to affiliates, and which allow the insertion of local picture content without the need for complete decompression and recompression. This is accomplished by the addition of a motion vector control module in the original video encoder which controls the selection of motion vectors during the initial data compression (encoding) process. In accordance with the present invention, an encoder operator can define one or more non-overlapping subregions of the picture where local insertion is to be enabled. Alternatively, preselected and predefined image subregions may be used at encoding time. The motion vector control module determines the minimum subset(s) of macroblocks in the picture that encompass the defined subregion(s). During encoding, the motion vector control module acts to guarantee that motion vectors associated with macroblocks outside of these subsets never result in the use of any pixels contained within the subsets for constructing predictions. Optionally, this module ensures that motion vectors associated with macroblocks within a subset never result in the use of pixels outside of the subset for constructing predictions. Additionally, the encoder contains a module which is capable of transmitting information to affiliate stations regarding the size, number and location of image subwindows or subregions into which data may be inserted. The information transmitted to a local station may include, e.g., the number of subsets of macroblocks available for local insertion, information identifying macroblocks belonging to each subset, and information informing the local station as to whether or not the optional motion vector constraint described above was enforced.
The present invention also involves an inserter device, which would reside at the local affiliate station that receives the information regarding the number and placement of subsets of macroblocks that have been made available for local insertion. An operator at the affiliate station can specify the location and picture content for local insertion. The inserter circuit parses through the coded digital bitstream representing the received encoded images, removes the data corresponding to the macroblocks that are affected by the desired local insertion, and replaces them with data corresponding to the desired local picture content. If it is desired that the local picture content includes pixels from the original video, then some amount of decoding of the original video would be performed. In cases where the optional motion vector constraint, described above, is enforced, then only those bits corresponding to the macroblocks need to be decoded. When the optional motion vector constraint were not enforced, then the inserter would decode some or all of the surrounding macroblocks in order to guarantee proper decoding of the pixels within the macroblocks that are to be affected by local insertion.
Another approach involves the use of SNR scaleable coding. In this embodiment, the local picture content is added by the addition of an enhancement bitstream, such as the SNR scaleable enhancement defined by MPEG-2. In addition to the local picture content, the SNR insertion device might optionally act to corrupt a small number of coded macroblocks of the network encoded bitstream, and encode the negative of the corruption signal in the SNR enhancement layer. In this way a program provider might be able to encourage the purchase of receivers that are capable of the SNR scaleable decoding, and discourage the inactivation of this feature to avoid viewing the local picture content.