1. Field of the Invention
This invention relates to video signal processing, for example, video signal processing in which data (possibly compressed data) representing two or more video signals are mixed or otherwise combined.
2. Description of the Prior Art
It is often desirable to mix, wipe or superimpose two or more video signals. For example, a so-called wipe effect might be used to transition between two different scenes in a television programme, or a so-called logo or other computer-generated signal such as a subtitle or a set of credits might need to be superimposed over a video image without otherwise disrupting the underlying image.
With analogue video signals, or even with uncompressed digital video signals, this operation is relatively straightforward. A key signal can be used to control the level of each of the constituent video signals (say, signals xe2x80x9cAxe2x80x9d and xe2x80x9cBxe2x80x9d) at each pixel position, with the two level-controlled signals then being added together. A basic relationship between the level of the key K signal, the levels A and B of the input pixels and the level of the output pixel at each pixel position might be:
Output pixel value=A(1xe2x88x92K)+BK 
This process is carried out for each output pixel. So, if signal A is to be replaced in its entirety by signal B at a particular pixel position, the key signal would be 1 (otherwise expressed as 100%), and if there is to be a 50:50 mix of the two pixels the key value would be 0.5 or 50%.
The situation is much more difficult when either or both inputs is a compressed video stream. In a compressed video stream such as an MPEG-2 video stream, pixels are generally compressed as blocks known as macroblocks, so that it is not possible to derive the value of a particular pixel directly from the compressed video signal.
Compressed video signals are also often subject to an overall limit on the quantity of data that can be used to transmit or store the signal. While there can be some variation from picture to picture, or even from group-of-pictures (GOP) to GOP, the time-averaged data rate is often constrained to the capacity of a transmission or storage channel. This allowable variation from picture to picture or GOP to GOP can mean that two signals to be combined can have the same nominal data rate but very different instantaneous data rates. So, when constructing a composite video signal from a group of video signals including one or more compressed signals, great care is needed to avoid a data overflow or underflow.
A third feature of compressed video signals relevant to this discussion is that they often make use of motion vectors to indicate blocks of temporally preceding or following pictures which are similar to a block of a current picture and so can cut down the amount of data needed to encode the current picture. Where two signals are being combined, however, it is possible that a motion vector for a current picture block can point to an area of a preceding or following image which has been replaced by or mixed with another input signal, so that the motion vector is no longer useful in the compression or decompression of that block.
One way of handling these problems is to decompress the entire compressed input signals, carry out the mixing or similar process in the non-compressed domain, and then re-compress the resulting composite pictures.
This process is limited by the general principle with compression systems such as the MPEG-2 system that each generation of compression tends to reduce the quality of the resulting images. It is undesirable if the simple addition of logo or similar information causes a deterioration in the overall image quality of the pictures to which the logo information is added.
In order to alleviate this drawback, it is desirable to recompress as much as possible of the composite picture using compression parameters (such as a quantisation parameter Q, motion vectors, DCT frame type and so on) which are the same as those used to compress the corresponding block of the relevant input picture. However, this general aim is made more difficult by the following considerations:
(a) how many (or, looked at another way, how few) blocks need to be re-encodedxe2x80x94with newly derived parametersxe2x80x94because their image content is not related sufficiently closely to one of the input images?
(b) what happens if this re-encoding scheme would lead to too much output data being generated for the transmission or storage channel under consideration?
(c) how can it be detected whether the motion vectors, associated with blocks whose parameters are to be re-used, point to areas of the same image in the preceding or following pictures?
(d) how can these matters be addressed with a manageable data processing overhead?
The invention aims to address or at least alleviate at least one of these problems.
This invention provides video signal processing apparatus in which at least first and second input video signals are combined to generate an output compressed video signal, the first and second video signals each having respective associated motion vectors from a data compression process applied to the video signals;
the apparatus comprising:
means for establishing a set of the motion vectors associated with the first and second video signals which can potentially be re-used in compressing the output compressed video signal; and
means for testing the set of motion vectors, the testing means being operable:
i. to define, with respect to reference images referred to by the motion vectors under test, a first border region of a predetermined width surrounding those parts of the reference images derived in part from the first video signal, and a second border region of a predetermined width surrounding those parts of the reference images derived in part from the second video signal; and
ii. to test whether motion vectors of the set which are associated with parts of each video signal refer either to image areas derived in part from the other video signal, to the first border region or to the second border region;
in which any motion vectors of the set being found by the testing means to refer to images areas derived to at least the predetermined degree from the other input video signal, the first border region or the second border region are not used in the compression of the output compressed video signal.
The invention allows for an efficient selection of which motion vectorsxe2x80x94preferably, from an initial xe2x80x9cshortlistxe2x80x9d, although that shortlist may potentially include all of the available motion vectorsxe2x80x94are suitable for use in the compression of an output video signal derived from two or more input video signals. A test is performed to detect whether each motion vector points either to an image area derived (to at least a predetermined degree such as 100%) from the other input video signal or to a border region established around both or all such image areas.
The border region allows for the possibility, particularly in a system where the reference map is created to a block resolution rather that to a pixel resolution, that a motion vector at an extreme position in one block could point beyond the nominal target block and into an adjacent one.
The use of one border region surrounding image areas derived from both or all input video signals reduces the storage and processing overhead of creating the border region and means that only one test needs to be performed on each potential motion vector.