The present invention relates to image segmentation with improved temporal consistency, and to image processing including steps dependent on segmentation.
An image is conventionally described by defining the attributes of each pixel of the image, in particular the pixel colour. In a monochrome image the attributes defining each pixel usually consist of the grey scale value of the pixel whereas in a colour image a plurality of colour component values need to be defined per pixel. The invention is not limited to these conventional images and attributes, however, and applies to any spatial attribute which can conveniently be represented in the form of a pixel array in two or more dimensions. Similarly, the concept of time and temporal consistency should be understood to include analogous dimensions, such as when segmentation of tomographic image xe2x80x9cslicesxe2x80x9d is performed for a sequence of images over a third spatial dimension, not necessarily in a time sequence.
Image segmentation is a known technique which groups pixels into regions, each region containing only pixels having similar attributes. The technique has many applications particularly in the field of image coding (compression). Image coding schemes using image segmentation are particularly suitable for low data rate transmission of image motion. Typical data rates may be as low as 64 kbits/s. They are, therefore suitable for applications such as video-phones which require the transmission of real time video information down the narrow bandwidth of a telephone line. Even if the segmentation is not itself encoded, the segmentation can be useful, for example, to concentrate the available bandwidth on the xe2x80x98importantxe2x80x99 parts of the image, such as the face of the person speaking. An image coding scheme which uses image segmentation explicitly is region and texture coding as described in published patent applications EP-A-0437002 (PHB 33610) and EP-A-0454234 (PHB 33626). When segmentation schemes are used for the coding of a series of frames in a motion picture, the visual artifacts which result from segmentation carried out on individual frames will change with each frame and may produce a subjectively very displeasing image sequence representation. It is therefore desirable that the segmentation be temporally consistent. That is to say that like groups of pixels should belong to like regions in succeeding frames. It should also be appreciated that image segmentation is not limited to use in image coding and can be used generally in diverse image processing applications such as image enhancement, object tracking, extraction of 3-D geometry from images, computer-aided animation and colourisation.
A known approach to the problem of temporally consistent image segmentation is to segment the image into regions of uniform motion to allow objects to be derived. It is then these objects for which motion vectors are calculated and output. Examples of such an approach are described in xe2x80x9cSegmentation and Motion Estimation in Image Sequencesxe2x80x9d by Norbert Diehl (SPIE Volume 1260, Sensing and Reconstruction of Three-dimensional Objects and Scenes 1990) and in EP 0 579 319 (PHB 33802) in the name of the present applicant.
A problem with this approach is that it relies upon a satisfactory division of the image into its constituent objects. This in turn relies either upon prior knowledge of the objects likely to be present in the image or is derived from complex processing of plural subsequent image frames. If there is only minimal knowledge of such objects, the modelling becomes very difficult and the splitting of the image into such objects cannot be satisfactorily achieved. Failure to segment the image satisfactorily tends to produce subjectively very displeasing results. For example, in one experimental video-phone application it has even occurred that a nose grows from the forehead of the transmitted face. As previously stated, this approach requires a complicated and extensive database of object models and additionally the matching of such models to the objects in the image may require excessive computation. Thus this approach is not presently a reliable technique for general image coding.
The present invention aims to provide an approach to temporally consistent segmentation that does not require the specific modelling of objects in the image. The temporal consistency imposed by use of the invention can also reduce the computation involved when simplistic assumptions relating to the motion in the image sequence are made, by providing a mechanism to detect and correct errors when such assumptions are invalid.
The present invention, defined in the appended claims, enables the provision of a consistent segmentation for a series of related pictures, for example to produce temporal consistency to the segmentation of a motion picture sequence containing moving objects.
In embodiments disclosed herein, a method of segmentation comprises some or all of the following steps:
(a) segmenting the initial picture of the series to produce an initial segmentation which assigns the pixels of the picture among a plurality of regions;
(b) calculating motion vectors from the initial and next picture of the series;
(c) applying the motion vectors to the initial segmentation to produce a predicted segmentation for the next picture;
(d) using the initial picture and the motion vectors to obtain predicted pixel values of the next picture;
(e) identifying pixels for which the motion vectors are invalid by comparing the predicted and actual pixel values for the next frame;
(F) segmenting the identified pixels to create further picture regions;
(g) replacing parts of the predicted segmentation with the further picture regions to produce an improved segmentation for the next picture; and
(h) repeating steps (b) to (g) using the next segmented picture as the initial picture and using the improved segmentation as the initial segmentation.
Prior to step (f), various heuristics may be applied to reduce the number of identified pixels, by allocating them to the same regions as neighbouring pixels, and/or to designate further pixels for consideration in step (f).
The segmentation performed in accordance with the present invention provides improved temporal consistency and therefore, although artifacts will still be present in the image segmentation, these will be consistent from frame to frame and hence their consequences will be less displeasing to a viewer.
For example, an embodiment of a method according to the present invention assumes that all interframe motion can be characterised as two-dimensional planar motion so that a conventional two-dimensional planar motion estimator, such as a block matching estimator, may be used to calculate the motion vectors for each pixel of the image. Not only is this assumption not generally valid for real-life sequences but, even if the interframe motion were to consist exclusively of two-dimensional planar motion, as the motion vectors are only calculated by the block, it is extremely unlikely that accurate motion vectors will be calculated for all the individual pixels in each block. In order to detect inaccurate motion vectors the embodiment calculates a displaced frame difference (DFD) of each pixel. The DFD represents per pixel the error between a frame and the estimate of that frame provided by the motion vectors based on the neighbouring frames. An excessive DFD may thus indicate that the motion vectors for a particular pixel are invalid. The DFD is then used to identify where the segmentation predicted by the motion vectors requires correction.