1. Field of the Related Art
The present invention relates to a method of decoding coded segmented pictures, or partitions, divided into homogeneous regions to which specific labels are respectively associated, and to a corresponding decoding device. This invention has mainly applications in the field of the MPEG-4 standard, for instance for the implementation of MPEG-4 decoders.
2. Description of the Related Art
The multiple grid chain code approach (MGCC), described in the document xe2x80x9cMultiple grid chain coding of binary shapesxe2x80x9d, by P. Nunes, F. Ferreira and F. Marquxc3xa9s, Proceedings of International Conference on Image Processing, vol. III, pp. 114-117, Oct. 26-29, 1997, Santa Barbara, Calif., USA, allows to efficiently encode binary shape information of video objects in the context of object-based video coding schemes. This approach relies on a contour representation of the partition. As shown in FIG. 1 that depicts a small general partition of size Nxc3x97M with for example three regions to which respective labels are associated (in the present case, represented by grey, black and white circles), any pixel has four different contour elements associated. FIG. 2 shows the same partition as depicted in FIG. 1, but with the indication of the specific contour elements that define the changes between every pair of neighbour pixels that do not belong to the same region. FIG. 3 shows in a corresponding matrix of (2N+1)xc3x97(2M+1) sites the implementation of both grids: that associated to the pixel sites (=the circles) and that related to the contour sites (=the segments of lines). The contour elements that are situated between pixels of different labels are considered as active.
As illustrated in FIG. 4, an element of the contour grid may have up to six active neighbours: due to that, the contour grid is usually referred to as hexagonal grid. A conventional way to code the partition information in the contour grid is to select an initial point in the grid and to track the active sites up to finishing the contour. This method performs a lossless coding of the partition information, by encoding the movement from a current contour element to the following neighbour contour element (only three possible movements: straight, right, left).
Another contour tracking method is to move through the contour using larger steps, only contour elements linked by such larger steps being encoded: in the above-cited document, the described MGCC technique uses basic cells of 3xc3x973 pixels, as the one illustrated in FIG. 5, where all contour and pixel sites in the cell are shown. For compression efficiency reasons, two types of cells are then used: counter-clockwise, as in FIG. 6, or clockwise, as in FIG. 7. The way to index the different contour elements in each type of cell is presented: the initial contour site is denoted by the symbol 0 and the other ones by the symbols 1 to 7, the symbol 1 being assigned to the site which is on the same side of the cell as the symbol 0. Consequently, to characterize a cell, three parameters are necessary: its initial contour site, its type (clockwise or counter-clockwise), and its orientation (=east or west for a cell with an horizontal initial contour element, north or south for a vertical one). The coding algorithm selects between each of these two types of cell in order to maximize the number of contour elements coded per cell.
The MGCC technique uses said indexing, as well as its possible rotations. Starting by the input element of the cell indexed with a 0 in FIG. 6, any output element from the set (1, 2, . . . , 6, 7) can be reached, but the way to go through the cell is not univocally defined by the output element: as shown in the example of FIGS. 8 and 9, a movement (in this case, from 0 to 4) may indeed correspond to two different contour configurations. The contour elements inside the cell (8, 9, 10, 11), not coded, introduce ambiguity in the coding process, two different sets of contour sites (0, 8, 9, 4) or (0, 11, 10, 4) being possible. This ambiguity introduces coding losses. Nevertheless, the only possible error is the erroneous labeling of the central pixel of the cell, i.e. only of an isolated boundary pixel.
In a contour tracking process, several cells are then linked up in order to complete the contour. To link two cells, the output contour site of the current cell becomes the input contour site of the following cell, as FIG. 10 shows. When coding with the MGCC approach the boundary of a single region, the bitstream thus generated therefore contains: (a) the position of the initial contour site, and (b) the chain of symbols representing the movements performed in order to track the contour.
In the contour tracking process, it is needed to change progressively the position of the basic cell through the grid. The proposed technique is described for instance in the article xe2x80x9cEncoding of line drawings with a multiple grid chain codexe2x80x9d, by T. Minami and K. Shinohara, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-8, no2, March 1986, pp. 269-276. The basic principle of said technique is explained with reference to FIGS. 11 to 15.
As indicated indeed in FIG. 11 giving an example of coding with a fixed grid, a contour segment has been coded by means of the symbol 4 in the first cell (a counter-clockwise one). If cells of the same grid are used, the symbol 7 (FIG. 12) is used in the second cell for tracking the contour (the symbol 0 being always the new initialxe2x80x94or inputxe2x80x94site), and the symbol 2 (FIG. 13) in the third one, these second and third cells being in that case clockwise ones. Three cells are therefore needed to track the contour from the first initial site. On the contrary, only two are needed if the grid (i.e, in fact, the position of the center of the cell) is changed: as shown in FIGS. 14 and 15 that illustrate a modification of the grid with respect to the example of FIGS. 11 to 13, only a second cell is now needed to go to the same output site.
This solution of FIGS. 14 and 15 leads to a more compact representation of the contour. However, three different classes of grids are then necessary to define the shift of each grid with respect to the origin of the corresponding cell (called GO) before said shift. These three classes G1, G2, G3 are defined, as indicated in the following classification table (Table 1) by the position (x, y) of the pixel which is the origin of each type of cell, with respect to the pixel which is the origin of said corresponding cell:
In the example of FIG. 10, a cell of the class G2 was used, with respect to the corresponding current cell GO. In another example illustrated in FIG. 16, a cell of the class G1 is used, with respect to the corresponding current cell GO.
However, the MGCC approach previously described can only be used to code binary partitions. In case of general segmented pictures, the partitions contain regions that share contours. A more appropriate approach is then described in the european patent application No99400436.4 (PHF995 11), filed on Feb. 23rd, 1999, that relates to a method of coding segmented pictures, or partitions (divided into homogeneous regions to which specific labels are respectively associated), comprising, for each successive partition, the steps of:
(a) translating the picture of labels into a description in terms of a contour element chain in which the elements are defined by means of their movements through successive basic cells, between an input point and an output point of said cells;
(b) tracking inside each successive cell each contour from its initial contour point, previously extracted, to its end by storing chain symbols corresponding both to input, internal and output contour elements of said cell and to priorities between multiple outputs elements;
(c) repeating these steps up to the end of each successive contour segment of the concerned partition;
(d) coding the information corresponding in each cell to the initial point of each contour segment and to the associated chain of movements between that initial point and the initial one of the following cell.
These successive steps define a so-called intra-mode coding process of image partitions. By introducing the concept of triple point at the intersection of two different contours, which allows to locate the initial point (or starting contour site) of a new contour (not yet coded) with respect to a previous one, and by introducing for these triple points a new symbol in the coding chain, an efficient coding is obtained: while the MGCC approach tracks the contour chaining cells, triple points will be replaced, in the present case, by the concept of cells with multiple outputs.
Two other embodiments (with respect to this basic implementation) are described in the application No99400436.4 already cited. They are related to the extension of the basic implementation to the cases of scalable partition and inter-mode partition sequence coding respectively. According to the first one, each current partition is divided into a basic layer and at least an enhancement one, and the described intra-mode coding process is successively applied to the basic layer without any change and to the enhancement layer with the following modifications: in the extraction step, the initial points of contour segments are associated to contour points from the basic layer; in the tracking step, all the points belonging to the basic layer are withdrawn; and, in the repeating step, the closing of a contour before processing the following one is associated to contour points of the basic and enhancement layers. According to the second embodiment, each current partition is divided into a first part, including some regions that need to be coded in intra-mode, and a second part, corresponding to the other regions to be motion compensated, and the intra-mode coding process is applied to said first part playing the role of the basic layer, while an associated inter-mode coding process is applied to said second part playing the role of the enhancement layer.
The flowchart of FIG. 17 corresponds to the basic implementation. The main steps of the illustrated process are the following: generation of the contour image, extraction of the initial points from the picture, cell characterization, contour tracking, determination of the priority in the contour tracking, management of the multiple points, end of process, and extraction of the next initial point, these steps being followed by the final coding steps.
The first step 401 allows to generate the contour pictures that will be later coded. The original partitions, described in terms of pictures of labels, are translated into a description in terms of contour elements defined with the above-described hexagonal grid. The second step 402 is provided for extracting all the initial contour points from the picture. The first contour to be considered is the contour of the frame of the picture: since the receiver will already know the shape of such a picture frame, the only information that has to be coded from this frame contour is the positions of the initial points that will define the starting of new contour segments. Contour points contacting this frame (in fact, the characteristics of the cell having as input contour these specific contour points) are stored in a buffer (a First-In, First-Out, or FIFO, queue) and called xe2x80x9cPENDINGxe2x80x9d points, in view of a specific treatment as seen later.
The third step 403 is a cell characterization one. Once the initial point of a contour has been selected, the cell defined by that point has to be characterized. For all initial points, the class of grid is set to GO (see the previous classification, called TABLE 1). In the case of an initial contour point contacting the frame contour, the side of the frame where said point is located fixes the orientation and type of the cell as shown in Table 2 (where c and cc respectively mean clockwise and counter-clockwise);
As can be seen, in the case of an initial point of an interior cluster of regions, due to the scanning adopted, the type is set to xe2x80x9ccounter-clockwisexe2x80x9d and the orientation to xe2x80x9ceastxe2x80x9d. In the case of a cell whose input contour site is neither a contour point contacting the frame nor the initial point of a cluster, its characterization depends on the prediction based on the movement performed in the previous cell. This way, the input contour of the current cell is the output contour of the previous one, and the assignment of orientation and cell type follows the rules indicated in Table 3, in which the second row (N, E, S, W, c, cc) represents the data of the current cell;
Once a cell has been characterized, the contour information in the cell has to be stored in the chain of symbols. A contour tracking step 404 is then provided. In said step, only those contour elements that are linked to the initial contour of the cell are taken into account, the other ones inside the cell (not linked to said initial contour) being withdrawn and analyzed later during the racking process. Starting by the input contour, the tracking is carried out with the following priority: straight-right-left, so that a list of output contours linked with the input contour is created. During this tracking operation, input and output contour points are marked as xe2x80x9cINOUTxe2x80x9d, and the other contour points inside the cell, necessary to link the input and output contours but which are neither input nor output sites, are marked as xe2x80x9cINTERNALxe2x80x9d. This step 404 of contour tracking inside the cell is illustrated in FIGS. 18 and 19. An original cell is presented in FIG. 18: since the outputs 5 and 6 are not connected to the input, they are withdrawn in FIG. 19. The contours that have been detected have then to be stored in the chain of symbols. If only one output contour is found, the movement linking the input to the output contour is stored; if multiple output points are detected, the cell will require more than one symbol to code the contour configuration.
A priority determination step 405 is then provided. In that step, the sub-chain describing the contour information in the cell starts with a symbol M (corresponding to xe2x80x9cmultiple outputxe2x80x9d), for each additional output. Thus, if the sub-chain of a cell starts with n symbols M, the following (n+1) symbols describe its different active outputs. The set of output contours is stored in the chain in a specific order. The symbol related to the output contour with the highest priority is introduced in the chain just after the set of symbols M and will be the input contour for the next cell (if the contour is not closed). The highest priority symbol is fixed by the length of the contour segment that links the input contour with each one of the output contours: it is the symbol corresponding to the longest path, which has the advantage of maximizing the number of contour elements coded per cell. If two contour segments have the same length, the ambiguity is solved by taking into account the tracking priority indicated previously: straigth-rigth-left. The other output contours are not ordered based on the length criterion, but on the basis of the tracking priority (straight-right-left).
Examples are given for a better understanding. The ordering of two output contours is shown in FIGS. 20 to 22. In FIG. 20, the longest path leads to the output 1, and the symbols in the chain are therefore M15. In FIG. 21, the longest path leads to 5: the symbols are now M51. However this set of symbols does not result in a unique cell configuration, since the third example of FIG. 22 leads to the same representation M51. In this last case, the paths leading to both outputs (5 and 1) have the same length: relying then on the tracking priority, the resulting chain is therefore M51.
For the case of a sub-chain with n symbols M, two examples of cells with multiple outputs are given in FIGS. 23 and 24. The example of FIG. 23 corresponds to a cell with two triple points. The chain of symbols describing the contour information of this cell is MM 153 (although the path leading to the output 3 is longer than that leading to the output 5, said symbol 5 appears first in the sub-chain: it is due to the ordering of the non-priority symbols, which is based on the tracking priority rather than on the length criterion). In the example of FIG. 24, where a quadruple point is present, the chain of symbols describing this cell contour information is MM 356 (the path leading to 3 is the longest one, and the tracking priority gives the priority to 5 with respect to 6).
When a multiple output is observed in a cell, a specific symbol is therefore included in the sub-chain, as it has just been described. This operation is followed by a multiple point management step 406. During said step, the output with less priority (unless it closes the contour segment) is stored in a buffer of pending points (a FIFO queue), and all output points that are stored in that manner in the buffer receive the mark xe2x80x9cPENDINGxe2x80x9d. The reason of such a marking operation is that not all the points that are stored in the buffer will be needed in the future to refer to a new contour segment. During the tracking process, a new cell may indeed contain a contour point previously marked as xe2x80x9cPENDINGxe2x80x9d. In this case, the segment that was associated to this xe2x80x9cPENDINGxe2x80x9d contour point has been already completed and the corresponding multiple point has to be removed from the chain (when erasing the effect of a multiple point from the chain, two symbols have to be removed: the symbol M of multiple point is first cancelled, and the movement that linked the input of the cell with the multiple output associated is then removed, which leads for instance to have only the single symbol 3 in the sub-chain if the information in a cell was previously encoded with the sub-chain M35, and a new cell covers the second output related to the movement towards 5). In addition, the concerned output contour has to be removed either from the buffer of possible initial points, if it contacts the frame, or from the buffer of xe2x80x9cPENDINGxe2x80x9d points. Finally, the mark of the former output pointxe2x80x94that associated to the movement towards 5xe2x80x94has to be updated. If the new cell has this contour point as an output, its mark moves from xe2x80x9cPENDINGxe2x80x9d to xe2x80x9cINOUTxe2x80x9d. If the new cell has this contour point as an interior contour point, its mark moves from xe2x80x9cPENDINGxe2x80x9d to xe2x80x9cINTERNALxe2x80x9d.
The last step of the tracking process, designated in FIG. 17 by the reference 407, is provided for ending a contour (test 471), a cluster (test 472), or the partition (test 473). The end of a contour segment relies on the contour elements previously marked as xe2x80x9cINOUTxe2x80x9d during the tracking process. When an output contour in a cell coincides with a contour marked as xe2x80x9cINOUTxe2x80x9d, this branch of the contour segment is closed. If the output contour is that of highest priority, the end of the contour segment is reached (at the decoder side, this will allow to close contours on known contour sites and prevent possible splitting of regions). Moreover, if the output contour is marked as xe2x80x9cINTERNALxe2x80x9d or xe2x80x9cPENDINGxe2x80x9d but any of the following contour sites that can be reached from it is an xe2x80x9cINOUTxe2x80x9d one, the contour (or the branch) is ended as well. When a contour segment ends, one has to check whether the cluster of regions has been finished. The buffer of xe2x80x9cPENDINGxe2x80x9d points is checked first, in order to see whether there are any further xe2x80x9cPENDINGxe2x80x9d points in it or all of them have been already extracted. If there are no xe2x80x9cPENDINGxe2x80x9d points left, the buffer of possible initial points (initial points contracting the frame) is checked (the reason for extracting first those contours from the buffer of xe2x80x9cPENDINGxe2x80x9d points is for coding efficiency purpose). If both buffers are empty, the cluster is finished and a new one (if any) is similarly considered. As previously said, the picture is scanned in a raster manner (from top to down and from left to right), and the first non-coded active contour is taken as the initial point for a new cluster of regions. If there are no more non-coded active contours, the whole partition has been coded.
This last step of the tracking process is therefore sub-divided into the three sub-steps 471 to 473. As long as a contour is not ended (test 471), a backward connection allows to repeat the steps 403, 404, 405, 406. As soon as a contour is ended, the test xe2x80x9cend of clusterxe2x80x9d is undertaken (test 472). As long as the cluster of regions is not finished, the following procedure is carried out. The buffer of xe2x80x9cPENDINGxe2x80x9d points is checked (operation 74: recuperation of a stored pending point), in order to see whether there are further xe2x80x9cPENDINGxe2x80x9d points in it. If all of them have been already extracted, the buffer of possible initial points contacting the frame is checked (the reason for extracting first these contours from the buffer of xe2x80x9cPENDINGxe2x80x9d points is for coding efficiency purposes). If both buffers are empty, the cluster is finished, and a new cluster of regions is considered (operation 73: extraction of the next initial point). When there are no more non-coded active contours (test 473), the whole partition has been processed.
The whole partition being processed, the last step of the partition coding method is an entropy coding step 408, including a first coding sub-step 481 for coding the information of initial points and a second coding sub-step 482 for coding the chain of movements.
For the implementation of the sub-step 481, it is necessary to distinguish between the external initial points and the internal ones. All the initial points contacting the frame (i.e. referring to the first cluster) are coded together as a header of the first chain of movements. For each initial point contacting the frame, the position of the point of the frame previous to said initial point is coded. These initial points are indexed thanks to two different word lengths: P bits are used for the horizontal dimension of the picture (top and bottom sides) and Q bits for the vertical one (left and right sides), with for instance P=log2[dim_x] and Q=log2[dim_y]. If there are finally no remaining points to be coded on one side of the frame, a specific word is used to indicate this situation (for instance, PT or QT). When the last initial point contracting the frame has been coded, another specific word (for instance, Po or Qo) is used to indicate the start of the chain of movements. If there are no initial points on the frame, the word Po precedes the code, called xe2x80x9cintern_ipxe2x80x9d, of the first internal initial point. The way to code these various situations, corresponding to different numbers of initial points on the four sides of the frame, is indicated in Table 4 (bot=bottom):
Internal initial points are similarly indexed. As they have been obtained by scanning the image from top to bottom, an internal initial point always corresponds to an horizontal contour site. Only the horizontal contours are therefore indexed.
The previous description was related, as indicated, to a coding method that allows to code segmented pictures according to a generalized MGCC approach. When the coded signals thus obtained are transmitted (and/or stored), they must be finally decoded, after the transmission step. The main objective of an MGCC decoding method and device provided to that end must then be to ensure the same number of regions in the decoded partition with respect to the original one.
However, when carrying out the coding method, owing to the fact that this encoding process is local (i.e. done cell by cell), the possible uncertainty in the central pixel of each cell may lead, when analyzing the whole partition at once, to inconsistencies in the decoded contours: new regions may be created, or regions may be split, so that the decoded contours are no longer in accordance with the original bitstream.
In order to face this problem, it would be possible to decode the sub-chain of information from each cell taking into account all the previously decoded contours, and then to verify at each step that the complete decoded contours remain in accordance with the original bitstream. Such an approach increases the complexity of the decoder, since the whole image has to be analyzed in order to decode the information of every cell. If the sub-chain of information from every cell is on the contrary decoded only taking care of the previous contour information contained in the local area covered by the cell, possible inconsistencies can be solved, but, when solving them, new ones may be created in a previous cell because this second approach does not ensure the number of scannings needed to solve any possible inconsistency, and a further scanning will be necessary to solve this new problem.
It is therefore an object of the invention to propose a partition decoding method with a lower complexity and however efficient for avoiding inconsistencies in the decoded contours.
To this end, the invention relates to a decoding method such as defined in the introductory paragraph of the description and which is moreover characterized in that is comprises:
(A) a forward decoding step, provided for decoding within said chain symbols all contour points that are sure in the decoded partition and detecting local inconsistencies;
(B) a backward decoding step, provided for tracking in a backwards manner the decoded contours and solving said inconsistencies on the basis of marks introduced during the detection of these inconsistencies.
The basic principle of said decoding method is to divide the decoding process into two main steps: a forward decoding step and a backward decoding one, both working at cell level (as the coding method having allowed to obtain the signals to be decoded). The first step, the forward decoding one, is provided mainly for fixing all the contour points that are sure and for detecting inconsistencies, while the backward decoding step is provided for solving these inconsistencies relying only on specific marks introduced during the forward decoding step, without having to store the original coded bitstream which is not used during this second step.