1. Field of the Invention
The present invention relates generally to a method and a system for processing a set of data on motion pictures and particularly to a method and a system for processing a set of data on a temporal sequence of two or three dimensional pictures with a time-dependent motion intermittently picked up therein, by using measures for a compensation of the motion in combination with an image segmentation of the pictures.
2. Description of the Related Art
Recent years have observed an increasing interest in a processing of a set of digital data on motion pictures, in particular in a field of art relating to a compression coding of such data.
The motion pictures appear as a temporal sequence of pictures each having an image region associated with a transient state of a motion. The transient state in an arbitrary picture has a mathematically analyzable correlation to that in a previous picture, so that the former is predictable from the latter in combination with an estimated correlation therebetween or by compensating the latter therewith, subject to the adequacy of an employed model of the motion for analyzing the correlation.
A predicted current state is comparable with a sampled current state to determine a difference therebetween as a prediction error. The prediction error is encodable together with associated parameter values of the model to obtain a compressed code, which permits a current state of the motion to be calculated in combination with a past calculated state thereof at a decoding end.
An adequate model provides an effective prediction to achieve a significant reduction in redundancy of the code.
A typical modelling is based on a motion compensation interframe prediction, in which a motion is defined in terms of a displacement of a block of a picked up image between a pair of frames each corresponding to a picture. Typically, the picture is divided into a set of square blocks each having an area about an order of 16.times.16 pixels. A total number of blocks in any picture is identical for this purpose, and respective blocks in an arbitrary picture have a one-to-one correspondence to those blocks in any other picture. With respect to each block, a past decoded picture is compensated by an estimated motion, to predict a picture to be compared with a current input picture. Thus, motion compensation is made in blocks, together with associated pixel data as they are inherent.
FIG. 1 exemplarily shows an encoder of a conventional motion picture coding-decoding system using the motion compensation interframe prediction. This conventional system is well known by the ITU-T(CCITT) Recommendations H.261, as a video coded for audiovisual services at p.times.64 kbit/s.
In FIG. 1, designated at reference character 100 is an entirety of the conventional system, 100a is the encoder, 101 is an input terminal of the encoder 100a, and 102 and 109 are frame memories, respectively.
The input terminal 101 inputs a sequence of pixel data D101 of a current picture P101. The frame memory 102 stores therein the data D101 of the current input picture P101. besides a set of data D100 of a past input picture P100 it received from the input terminal 101 and stored therein in a last frame. The frame memory 109 has stored therein a set of data D102 of a local decoded picture P102 that is a matrix of restored pixel data of the past input picture P100.
The data D100 of the past input picture P100 is sequentially read from the frame memory 102, and the data D102 of the local decoded picture P102 from the frame memory 109. They are either selected by a switch 103a, as a sequence of data D103 representing a reference picture P103 (to be P100 or P102), and input to a motion estimator 103, which concurrently receives the data D101 of the current input picture P101.
Incidentally, as shown in FIG. 2, the conventional system 100 has, as a common field Fc to a variety of associated computations, an imaginary orthogonal coordinate system defined by a combination of an axis of abscissa X corresponding to a bottom side of a picture frame Fp and an axis of ordinate Y corresponding to a left lateral side of the picture frame Fp. The picture frame Fp, as well as any block B (.beta./.alpha.) therein and a geometrical gravity center G (.beta./.alpha.) thereof, is congruently mapped in the imaginary field Fc each time a set of associated data is processed in the system 100, where ".alpha." is a picture identification number and ".beta." is a block identification number.
As illustrated in FIG. 2, the motion estimator 103 estimates, by calculation for each block B(i/101) (i=an arbitary integer) of the current input picture P101, a motion as a displacement vector Vd.sub.i in terms of a combination of a sense and a magnitude of a displacement of a gravity center G(i/101) of an i-th block B(i/101) in the current input picture P101 relative to that G(i/103) of a corresponding i-th block B(i/103) in the reference picture P103, thereby obtaining a set of data D104 of a set Vm of motion vectors of which an i-th one Vm.sub.i consists of an X-component Vx.sub.i as a projection of the displacement vector Vd.sub.i to the axis X and a Y-component Vy.sub.i as that to the axis Y.
The data D104 of the motion vector set Vm is sequentially output from the motion estimator 103 to a motion compensator 104 and an encoding multiplexer 110.
The multiplexer 110 encodes the data D104 of the vector set Vm into a sequence of corresponding codes.
At the motion compensator 104 the data D104 of the motion vector set Vm are processed together with the data D102 of the local decoded picture P102 input from the frame memory 109 so that a gravity center B(i/102) of each block B(i/102) in the local decoded picture P102 has a position thereof displaced or coordinate-component wise compensated along the axes X and Y by equivalent distances to X- and Y-components Vx.sub.i and Vy.sub.i of a corresponding motion vector Vm.sub.i, respectively, to thereby obtain a motion-compensated picture P104 as an interframe predicted one for the current input picture P101.
As a result, each pixel data in each displaced block in the motion-compensated picture P104 is updated by data that a corresponding pixel in that block in the local decoded picture P102 had been carrying.
The motion compensator 104 sequentially outputs a set of data D105 of the motion-compensated picture P104 to a subtractor 105 and an adder 108.
The subtractor 105 performs a pixel-mode subtraction of the motion-compensated picture P104 from the current input picture P101, obtaining a set of data D106 representative of a differential picture P105 therebetween, i.e. a picture having elementwise distributed thereon a matrix of prediction errors due to a motion compensation in a 16.times.16-pixel block mode by the compensator 104. Accordingly, the data D101 of the current input picture P101 is converted into a compressed set of data as the data D106 representing the prediction errors.
The prediction error data D106 of the differential picture P105 is sequentially input from the subtractor 105 to a unit 106 adapted for a discrete cosine transformation and quantization (hereafter "DCT-Q") process, i.e. for a data compression. The data is mapped in an 8.times.8-pixel block mode from a real measure space through a discrete cosine transform function into a related frequency field, to be expressed in terms of a combination of cosine coefficients of a corresponding cosine series to an associated 8.times.8-pixel block. The cosine coefficients are then quantized.
As a result, the error data D106 is further compressed into a set of combinations of data D107 each representative of a quantized coefficient, so that the set of data D107 represents the differential picture P105.
The compressed data D107 of the differential picture P105 is sequentially output from the DCT-Q unit 106 to the encoding multiplexer 110 and a unit 107 adapted for an inverse quantization and inverse discrete cosine transformation (hereafter "IQ-IDCT") process, i.e. for a data decompression.
The multiplexer 110 encodes the compressed data D107 of the differential picture P105 into a sequence of corresponding codes, and multiplexes them together with the codes of the the motion vector set Vm, into a sequence of multiplexed codes C101 to be output via an output terminal 111 of the encoder 100a.
At the IQ-IDCT unit 107, each input combination of data D107 is inverse quantized into a combination of corresponding cosine coefficients of a cosine series in a related frequency field, which coefficients are then inverse mapped from the frequency field through an inverse discrete cosine transform function into the real measure space. In this way, a corresponding 8.times.8-pixel block in a vacant picture frame Fp mapped in the X-Y coordinate system has pixel data thereof equivalent to corresponding ones in the differential picture P105.
In due course, the picture frame Fp becomes solid with a set of such pixel data D109, thus resulting in a restored differential picture P106 equivalent to the differential picture P105.
The data D109 of the restored differential picture P106 is sequentially output to the adder 108, where it is subjected to a pixel-mode addition with the data D105 of the motion-compensated picture P104 input from the motion compensator 104, to thereby obtain a set of data D110 representative of a local restored picture P107 equivalent to the current input picture P101.
The data D110 of the restored current picture P107 is sequentially input from the adder 108 to the frame memory 109, where it is stored at corresponding addressed locations, as a set of data representing a local decoded current picture to be employed, in a subsequent frame, as a subsequent local decoded picture of a subsequent past input picture.
Incidentally, the sequence of output codes C101 is transmitted through a transmission line 112 to a decoder side of the system 100.
FIG. 3 exemplarily shows a decoder of the conventional motion picture coding-decoding system 100.
In FIG. 3, designated at reference character 100b is the decoder of the system 100, 120 is a decoding demultiplexer, 121 is an input terminal of the decoder 100b, and 125 is a frame memory. The decoding demultiplexer 120 receives the multiplexed codes C110 representative of the differential picture P105 and the motion vector set Vm via the input terminal 121 connected to the transmission line 112. The frame memory has stored therein a set of data D120 representative of a past decoded picture P120 equivalent to the local decoded picture P102 in the encoder 100a.
The demultiplexer 120 demultiplexes the codes C110 into a code sequence representative of the differential picture P105 and a code sequence representative of the vector set Vm, and decodes the former into a sequence of data D121 equivalent to the compressed data D107 in the encoder 100a and the latter into a sequence of data D122 equivalent to the data D104 in the encoder 100a.
The data D121 is input to an IQ-IDCT unit 122, which functions in a similar manner to the IQ-IDCT unit 107 in the encoder 100a, thus sequentially outputting a set of data D123 representative of a differential picture P121 equivalent to the restored differential picture P106.
The data D122 is input to a motion compensator 123, which concurrently receives the data D120 of the past decoded picture P120 from the frame memory 125 and compensates this data D120 by data D122 in a similar manner to the motion compensator 104 in the encoder 100a motion compensator 123 outputs a set of data D124 representative of a motion-compensated picture P122 as a predicted current picture equivalent to the motion-compensated picture P104.
The data D123 of the differential picture P121 and the data D124 of the motion-compensated picture P122 are input to an adder 124, where they are added to each other in a similar manner to the adder 108 in the encoder 100a, to thereby obtain a set of data D125 representative of a current decoded picture P123 equivalent to the local decoded picture P107. This data is output as a datastream via an output terminal 126 of the decoder 100b. This datastream is branched to be input to the frame memory 125, where it is stored as a set of data representative of the current decoded picture P123 to be employed as a subsequent past decoded picture in the subsequent frame.
The motion compensation interframe prediction permits an effective data compression of a motion picture even in the conventional coding-decoding system 100.
In the conventional system 100, however, a single motion is estimated to determine a single vector for each of all square blocks having a 16.times.16-pixel size. Such a restriction constitutes some drawbacks.
For example, in a block, some pixels may represent an image of an object moving in a certain direction, and other blocks, that of another object moving in a different direction. An estimated motion of such a block may provide an erroneous motion vector, causing an associated prediction error to be increased, thus resulting in a reduced coding efficiency.
Additionally, in a picture, one block may represent an image of a certain portion of a moving object, and another block, that of a neighboring portion thereof. An estimated motion of the former may be different from that of the latter, giving rise to an erroneous contour or discontinuity across a continuous image region of the object, thus resulting in a degraded picture quality.
Recent years have further observed an increasing interest in a processing of data on motion pictures, in relation to an image segmentation.
Image segmentation is a growing technique for reducing a redundancy in a stream of data on a set of pictures that may be a sequence of colored motion pictures each consisting of a matrix of picture elements or pixels.
An arbitrary pixel Px in such a picture sequence is identifiable by a location Lc thereof in an associated picture Pc identified by a frame number Nf thereof, such that Px=Px (Lc; Pc(Nf))=Px(Lc; Nf). The location Lc may be defined by a combination (x, y) of coordinates x and y in an x-y coordinate system fixed to the picture Pc or by an address defined in a pixel matrix. The frame number Nf may be defined by a measure t.sub.0 from an initial time to on a real time axis, such that Nf=Nf(t)=Gs (t-t.sub.0)/Tf , where Tf is a frame period and Gs is a Gauss step function.
In general, the pixel Px(Lc; Nf) is characterized by character information or data associated therewith in terms of a set .phi. of character parameters .phi..sub.a such as a combination of variants representative of coordinates in an R-G-B color coordinate system and/or an associated luminance, where "a" is an arbitrary one of a plurality of character identification numbers.
Accordingly, an arbitrary pixel Px in an arbitrary picture Pc may be defined such that Px=Px(Lc, .phi.; Nf)=Px(x, y, {.phi..sub.a }; t). Thus, letting x and y also be character parameters .phi..sub.x and .phi..sub.y, respectively, the pixel Px may be defined by Px=Px(.phi.'; t), where .phi.' is an extended character parameter set such that .phi.'={.phi..sub.x, .phi..sub.y, {.phi..sub.a }}={.phi..sub.b }, where "b" is an extended character identification number so that "b" may be "x", "y" or "a".
At a particular time point t=t.sub.c, therefore, an associated picture Pc may be defined such that Pc=.PSI., where .PSI. is a union of extended character parameter sets .phi.' of the picture Pc so that .PSI.={U .phi.'; t=t.sub.c (i.e. parameters) .phi..sub.b in each set .phi.' are valued)}.
The image segmentation dissolves the union set .PSI. of the valued parameter sets .phi.' (at t=t.sub.c) to a predetermined number of subsets .PSI..sub.d (d=subset identification number) thereof each consisting of a variable number of parameter sets .phi.', so that no parameter set .phi.' is shared between any pair of subsets .PSI..sub.d and that each parameter .phi..sub.b has an arbitrary pair of values thereof both in a subset .PSI..sub.d, as they are alike or relatively vicinal to each other, and either in both of a pair of subsets .PSI..sub.d, as they are relatively distant from each other.
The dissolving may comprise a clustering, as discussed in a paper "Combining Color and Spatial Information for Segmentation" by Nobuaki IZUMI et al., the Proceedings of the 1991 Spring Term National Conference of the Institute of the Electronics, Intelligence and Communication Engineers of Japan, D680, p. 7-392.
FIG. 4 illustrates a basic concept of the clustering, as it has an exemplarily reduced number of parameter dimensions to permit an intuitive comprehension. Like items to FIG. 2 are designated by like reference characters. For brevity, notations of parameter sets or elements thereof will be commonly applied to all spaces, permitting scales of associated coordinate axes to be varied, providing that x=X and y=Y.
As shown at the left side of FIG. 4, a common field Fc defined in an imaginary X-Y coordinate system has mapped therein from an unshown real space a matrix as a union set .PSI. of valued parameter sets .phi.' each consisting of five character parameters .phi..sub.x =X(.gamma./.alpha.), .phi..sub.y =Y(.gamma./.alpha.), .phi..sub.2 =R(.gamma./.alpha.) (data on a red color), .phi..sub.3 =G(.gamma./.alpha.) (data on a green color) and .phi..sub.4 =B(.gamma./.alpha.) (data on a blue color) and of a corresponding pixel Px(.gamma./.alpha.) in a picture Pc with an identification number .alpha. (i.e. t=t.sub.c =t.sub.0 +Tf.times..alpha.), where ".gamma." is a pixel identification number.
In the common field Fc, therefore, a set .phi. of color parameters .phi..sub.a (a=2, 3, 4) of each pixel Px(.gamma./.alpha.) is degenerated at a corresponding coordinate (X, Y). In place of the color parameters, there may be employed a set of luminance and chrominance parameters valued as Y(.gamma./.alpha.), Cb(.gamma./.alpha.) and Cr(.gamma./.alpha.).
The parameter values R(.gamma./.alpha.) G(.gamma./.alpha.) and B(.gamma./.alpha.) of any pixel Px(.gamma./.alpha.) represent in combination a chromatic color that is identical or vicinal to, for example, a first color illustrated by a white circle, a second color illustrated by a shadowed circle or a third color illustrated by a black circle.
As shown in the middle of FIG. 4, the union set .PSI.={.phi.'} is elementwise mapped into a three-dimensional parameter space defined by an R-G-B coordinate system with an R-axis, a G-axis and a B-axis representative of character parameters .phi..sub.2, .phi..sub.3 and .phi..sub.4, respectively, while the remaining parameters .phi..sub.x and .phi..sub.y are degenerated therein.
Accordingly, those pixels illustrated by the white circle are all mapped in a connected region 1, as their colors are identical or vicinal to each other. They constitute a set of vicinal points called a "cluster (in the parameter space)" corresponding to a subset .PSI..sub.d which also is called a "cluster (in the common field Fc)". Each pixel-representative point (R, G, B) in the cluster is labelled as a element thereof, with a cluster identification number equivalent to the subset identification number d (to be 1 in this case).
Likewise, those illustrated by the shadowed circle are mapped to be clustered in a connected region 2, and those illustrated by the black circle in a connected region 3.
There are thus constituted three clusters 1, 2 and 3 in the regions 1, 2 and 3 disconnected from each other in the parameter space.
Then, as shown at the right side of FIG. 4, the clusters 1, 2 and 3 are elementwise inverse mapped into the common field Fc, so that their elements are each mapped in a form of a pixel Px(.gamma./.alpha.) with a corresponding label d (1, 2 or 3) representative of a cluster (as the subset .PSI..sub.d) it belongs in this field Fc.
As a result, an image segmentation is performed by a clustering.
In the case of FIG. 4, however, the parameter space has two degenerated parameters .phi..sub.x and .phi..sub.y for the convenience of illustration, so that the clustering is performed of three parameters .phi..sub.2, .phi..sub.3 and .phi..sub.4, thus resulting in three clusters 1, 2 and 3 separated from each other by a dashed line in the field Fc. The cluster 1 in Fc is provided as a combination of a right lower region and a left upper region distant from each other.
In a practical clustering, therefore, there is employed a five-dimensional parameter space including X and Y axes for the parameters .phi..sub.x and .phi..sub.y, so that the right lower and left upper regions of the cluster 1 may have a significant distance detected therebetween in the parameter space and may thus be clustered in the field Fc either with a label 1 and the other with a label 4.
Another practical clustering may employ a common field Fc with a set of luminance and chrominance parameters .phi..sub.a, i.e. a five-dimensional parameter space including Y, Cb and Cr axes for the parameters .phi..sub.a.
Incidentally, for a comprehensive classification of various fields and spaces, there will sometimes be employed herein three notations "S" representing a real or imaginary spatial field, "ST" representing a real or imaginary spatiotemporal field, and "[e]" representing a measure "e" of dimension, where "e" is an arbitrary integer. For example. the common field Fc is an ST[3] class, and the parameter space defined by the R-G-B coordinate system is an S[3] class.
FIG. 5 shows an exemplary motion picture processing system for describing a conventional image segmentation using a five-dimensional clustering.
In FIG. 5, designated at reference character 200 is an entirety of the processing system. Like terms will be designated by like characters between the foregoing description and the following description.
The system 200 comprises a frame memory 210, an address generator 220 and a clustering circuit 230.
The frame memory 210 receives for storing therein three sequences of parallel color data D200 of a current input picture P200 from an unshown input port, in synchronism with a sequence of write address data D221w output thereto from the address generator 220. The frame memory outputs therefrom a three of sequences of parallel color data D210 of a certain picture P210 in concern (that may be a past input picture or the current input picture), in synchronism with a sequence of read address data D221r output thereto from the address generator 220. The picture P210 is now assumed to be equivalent to the picture Pc at t=t.sub.c in FIG. 4, for an intuitive comprehension.
The frame memory 210 is composed of a three of parallel memories, i.e. an R-memory 211, a G-memory 212 and a B-memory 213.
The color data D200 includes a set of R-data on a red color, a set of G-data on a green color and a set of B-data on a blue color respectively of the picture P200. The color data D210 also includes a set {.phi..sub.2 } of R-data D211=R(.gamma./.alpha.) on a red color, a set {.phi..sub.3 } of G-data D212=G(.gamma./.alpha.) on a green color and a set {.phi..sub.4 } of B-data D213=B(.gamma./.alpha.) on a blue color respectively of the picture P210.
The write address data D221w as well as the read address data D221r each consist of a pair of address data. The address data pair in the data D221w defines a write location in each of the R-, G- and B memories 211, 212 and 213, which location corresponds to a pixel position in the picture P200. The address data pair in the data D221r defines a read location in each of the R-, G- and B memories 211, 212 and 213, which location corresponds to a two-dimensional coordinate (x, y) of a position that a pixel Px has in the picture P210.
The R-, G- and B-data of the picture P200 as input to the frame memory 210 are written in the R-, G- and B-memories 211, 212 and 213, respectively, at write locations therein defined by the address data D221w. The R-, G and B-data of the picture P210 to be output from the frame memory 210 are read from the R-, G- and B-memories 211, 212 and 213, respectively, and more specifically, from read locations therein defined by the address data D221r.
The address generator 220 further outputs a sequence of location data D222 in synchronism with the read address data D221r. Each location data D222 represents (or consists of) a corresponding one of the read address data D221r, and comprises a pair of data D223 and D224 representing, either D223, an x-coordinate .phi..sub.x and, the other D224, a y-coordinate .phi..sub.y respectively of the afore-mentioned coordinate (x, y).
The clustering circuit 230 processes a synchronized combination {.phi.'} of the color data D210={.phi..sub.2, .phi..sub.3, .phi..sub.4 } of the picture P210 and the location data D222={.phi..sub.x, .phi..sub.y } so that, in a frame Nf (=.alpha.) of time, the picture P210 as a union set .PSI. of valued character parameter sets .phi.' is congruently mapped in a common field Fc of an ST[3] class. In this mapping the union set .PSI.={X(.gamma./.alpha.), Y(.gamma./.alpha.), R(.gamma./.alpha.), G(.gamma./.alpha.), B(.gamma./.alpha.)} is mapped to a practical five-dimensional parameter space of an S[5] class consisting of a set of spatial points each defined by a coordinate (X, Y, R, G, B) or by a coordinate (R, G, B, X, Y), while the latter is employed in this case for an intuitive consistency with FIG. 4.
In the parameter space, the union set .PSI. is elementwise clustered in a set {Ci} of n clusters Ci, where "n" is a predetermined arbitrary integer and "i" is an arbitrary integer such that 1.ltoreq.i.ltoreq.n. In this case, {Ci}={1, 2, 3, 4}={C.sub.1, C.sub.2, C.sub.3, C.sub.4 } and n=4.
Then, the n clusters Ci[5] in the S[5] field are inverse mapped to the ST[3] field Fc. As a result, in the frame .alpha., the picture Pc as a two-dimensional set of pixels Px(.gamma./.alpha.) in this field Fc is grouped or clustered into n clusters Ci[2] in terms of valued subsets .PSI..sub.d labelled 1 to 4 in this case, i.e., it is apparently segmented into n regions RS each connected therein and labelled with a number "i" (corresponding to i of Ci and d of .PSI..sub.d) such that RSi.
The clustering circuit 230 sequentially outputs a set of data D230 on such the result of image segmentation, each including clock-dependent or valued information on a combination of a pixel identification number .gamma. and an associated label i as a cluster or region identification number.
The clustering and hence the segmentation is a result of a mapping to or from an imaginary space, based on a real computation according to an algorithm.
The algorithm will be described below for an arbitrary integer n (not limited to 4), providing that a number of scales are determined as color and location parameter weighting factors k.sub.0 and k.sub.1 in the mapping between the ST[3] class field Fc and the parameter field of S[5] class, such that: EQU R[5]=k.sub.0.sup.1/2.times.R[2] (1), EQU G[5]=k.sub.0.sup.1/2.times.G[2] (2), EQU B[5]=k.sub.0.sup.1/2.times.B[2] (3), EQU X[5]=k.sub.1.sup.1/2.times.X[2] (4), and EQU Y[5]=k.sub.1.sup.1/2.times.Y[2] (5),
In the conventional system 200, the picture Pc is initially equi-divided into n square blocks Ci.sub.0 each having a geometrically central pixel Pg.sub.i0 therein as a representative pixel Pr.sub.i0 thereof with a set .phi.' of associated parameter values componentwise representative of a five-dimensional parameter vector Vp.sub.i0 defined in the S[5] field, such that: ##EQU1##
Thus, each block C.sub.i0 is represented by the vector Vp.sub.i0 as a representative vector thereof in the S[5] field.
Likewise, every pixel Px(.gamma.(x, y)/.alpha.) in the picture Pc is represented in the S[5] field by a corresponding five-dimensional parameter vector Vp.sub.xy, such that: ##EQU2##
Then, each parameter vector Vp.sub.xy has a euclidean relative distance D.sub.i0 thereof determined to the representative vector Vp.sub.i0 of each block C.sub.i0, such that: ##EQU3##
Each pixel Px(.gamma.(x, y)/.alpha.) represented by the parameter vector Vp.sub.xy thus has n relative distances D.sub.i0 determined therefor, including a minimum one D.sub.min-0, and is labelled with an identification number i.sub.0 (or an incremented number i.sub.1) of a block C.sub.i0 associated with the minimum distance D.sub.min-0.
Accordingly, all pixels Px of the picture Pc in the field Fc are each labelled with a corresponding one of n identification numbers i.sub.0 (or i.sub.1) (i=1 to n), so that the picture Pc is re-segmented into n first-order clusters C.sub.i1 (i=1 to n) each consisting of N.sub.i1 pixels Px.sub.i1 labelled with an identical number i.sub.0 (or i.sub.1), while the number N.sub.i1 of pixels Px.sub.i1 is variable.
The N.sub.i1 pixels Px.sub.i1 in each first-order cluster C.sub.i1 have their parameter values R(.gamma.(x, y)/.alpha.), G(.gamma.(x, y)/.alpha.), B(.gamma.(x, y)/.alpha.), X(.gamma.(x, y)/.alpha.) and Y(.gamma.(x, y)/.alpha.) arithmetically averaged thereamong to obtain a set of representative parameter values R.sub.ci1, G.sub.ci1, B.sub.ci1, X.sub.ci1 and Y.sub.ci1 of the first-order cluster C.sub.i1, such that: EQU R.sub.ci1 =(.SIGMA.R(.gamma.(x, y)/.alpha.))/N.sub.i1 (9), EQU G.sub.ci1 =(.SIGMA.G(.gamma.(x, y)/.alpha.))/N.sub.i1 (10), EQU B.sub.ci1 =(.SIGMA.B(.gamma.(x, y)/.alpha.))/N.sub.i1 (11), EQU X.sub.ci1 =(.SIGMA.X(.gamma.(x, y)/.alpha.))/N.sub.i1 (12), and
Y.sub.ci1 =(.SIGMA.Y(.gamma.(x, y)/.alpha.))/N.sub.i1 (13),
where the sum .SIGMA. is taken of the N.sub.i1 pixels Px.sub.i1.
The cluster C.sub.i1 is represented by a representative vector Vp.sub.i1 defined in the S[5] field, such that: EQU Vp.sub.i1 =(R.sub.ci1, G.sub.ci1, B.sub.ci1, X.sub.ci1, Y.sub.ci1) (14).
In any cluster C.sub.i1, a geometrically central pixel Pg.sub.ci1 thereof may have its parameter set .phi.' different from the set of representative parameter values R.sub.ci1, G.sub.ci1, B.sub.ci1, X.sub.ci1 and Y.sub.ci1.
Then, each parameter vector Vp.sub.xy has a euclidean relative distance D.sub.i1 thereof determined to the representative vector Vp.sub.i1 of each cluster C.sub.i1, such that: ##EQU4##
Each pixel Px(.gamma.(x, y)/.alpha.) thus has n relative distances D.sub.i1 determined therefor, including a minimum one D.sub.min-1, and is labelled with an identification number i.sub.1 (or an incremented number i.sub.2) of a cluster C.sub.i1 associated with the minimum distance D.sub.min-1.
Accordingly, all pixels Px of the picture Pc in the field Fc are each labelled with a corresponding one of n identification numbers i.sub.1 (or i.sub.2) (i=1 to n), so that the picture Pc is re-segmented into n second-order clusters C.sub.i2 (i=1 to n) each consisting of Ni.sub.2 pixels Px.sub.i2 labelled with an identical number i.sub.1 (or i.sub.2). Also the number N.sub.i2 of pixels Px.sub.i2 is variable.
Like operation is repeated, as necessary. In due course, the picture Pc is re-segmented from a set {C.sub.ij } of j-th order clusters C.sub.ij (i=1 to n; j=arbitrary integer) each represented by a representative vector Vp.sub.ij in the S[5] field, to a set {C.sub.i(j+1) } of j+1-th order clusters C.sub.i(j+1) each represented by a representative vector Vp.sub.i(j+1) in the S[5] field.
At j+1=k, if the representative vector Vp.sub.ik of the k-th order cluster C.sub.ik is equivalent to that Vp.sub.ij of the j-th order cluster for each i, the clustering is converged and hence, when relabelled after the set {C.sub.ik }, each pixel Px is kept labelled with a previous cluster number i.sub.j-1 (or an incremented number i.sub.j). An image segmentation is completed with the set {C.sub.ij }, so that each cluster of pixels with an identical number i.sub.j-1 (or i.sub.j) constitutes a final connected region.
For the clustering to be repeated a necessary number of times until a convergence, the frame memory 210 and the address generator 220 of FIG. 5 are adapted to repeat sequentially outputting a synchronized parallel combination of the color data D210 and the location data D222 of the picture P210. Further, for an initial setting of blocks C.sub.i0, the address generator 220 is adapted to sequentially output a set of data on addresses of the central pixels Pg.sub.i0 (=Pr.sub.i0) to the frame memory 210 and the clustering circuit 230.
FIG. 6 is an exemplarily detailed block diagram of the clustering circuit 230 in the conventional system 200 of FIG. 5. This example employs an incremented cluster identification number.
In FIG. 6, designated at reference character 235 is a cluster number memory, 236 is a cluster number generator, and 238 is a cluster memory.
The cluster number memory 235 stores therein for each j an identification number i.sub.j (i=1 to n) of each cluster C.sub.ij, at respective addresses Ap.sub.xy corresponding to coordinates (X, Y) of all of N.sub.ij pixels Px(.gamma./.alpha.) as Px.sub.ij labelled with the cluster number i.sub.j, to thereby update a previous identification number i.sub.j-1. In place of the initial identification number i.sub.0 for any block C.sub.i0, there is employed a particular integer, such as -1, that will never be found in a value range of i.sub.j for 1.ltoreq.j.
The cluster number generator 236 generates to output for each j a sequence of cluster numbers 1.sub.j to n.sub.j, as necessary.
The cluster memory 238 stores therein at least for each j, at each cluster address Ac.sub.ij corresponding to one cluster number i.sub.j, a set of representative parameter values R.sub.cij, G.sub.cij, B.sub.cij, X.sub.cij and Y.sub.cij as components of a representative vector Vp.sub.ij of an associated cluster C.sub.ij, such that: EQU R.sub.cij =(.SIGMA.R(.gamma.(x, y)/.alpha.))/N.sub.ij (16), EQU G.sub.cij =(.SIGMA.G(.gamma.(x, y)/.alpha.))/N.sub.ij (17), EQU B.sub.cij =(.SIGMA.B(.gamma.(x, y)/.alpha.))/N.sub.ij (18), EQU X.sub.cij =(.SIGMA.X(.gamma.(x, y)/.alpha.))/N.sub.ij (19), and EQU Y.sub.cij =(.SIGMA.Y(.gamma.(x, y)/.alpha.))/N.sub.ij (20),
where the sum .SIGMA. is taken of the N.sub.ij pixels Px.sub.ij. Each address Ac.sub.ij is representative of an associated cluster number i.sub.j, and vice versa.
As shown in FIG. 6, the clustering circuit 230 comprises a distance calculator 231, a minimum distance estimator 232, a convergence estimator 234, the cluster number memory 235, the cluster number generator 236, an average calculator 237 and the cluster memory 238.
The distance calculator 231 sequentially calculates, for each pixel Px each j, a set {D.sub.ij } of euclidean relative distances D.sub.ij between a parameter vector Vp.sub.xy of the pixel Px(.gamma.(x, y)/.alpha.) and respective representative vectors Vp.sub.ij of clusters C.sub.ij, such that: ##EQU5##
The distance calculation for any pixel Px is made of all the n clusters C.sub.ij in an order in which the cluster number i.sub.j is output from the generator 236, while the order is common to each j. Calculated distances D.sub.ij are output in the same order as the calculation.
The minimum distance estimator 232 functions for each pixel Px each j so that, upon reception of an m-th distance D.sub.ij, where "m" is an arbitrary integer such that 1.ltoreq.m.ltoreq.n, it has a minimum distance held therein as a distance criterion CD.sub.m-1 from among m-1 distances D.sub.ij it has received till then, and compares the m-th distance D.sub.ij with the criterion CD.sub.m-1 to thereby select from therebetween a smaller one to be held therein as a subsequent criterion CD.sub.m. An initial criterion CD.sub.0 is predetermined to be a maximum permissible value. A final criterion CD.sub.n should be a minimum distance D.sub.min-j in the set {D.sub.ij }, so that an identification number i.sub.j of a cluster C.sub.ij associated therewith should be output as a label for the pixel Px in concern.
The convergence estimator 234 functions for each pixel Px each j to compare the current cluster number i.sub.j output from the estimator 232 for the pixel Px in concern with a previous cluster number i.sub.j-1 stored at a corresponding location in the memory 235, to thereby check for a difference or detect a coincidence therebetween. At j=k, if the coincidence is detected of all pixels Px, an associated clustering is converged, permitting the data 230 on a result of the clustering to be output.
The average calculator 237 sequentially calculates for each j the set of representative parameter values R.sub.cij, G.sub.cij, B.sub.cij, X.sub.cij and Y.sub.cij of each cluster C.sub.ij, in accordance with the expressions (16) to (20).
For each frame Nf in which a single picture Pc is processed, the clustering circuit 230 functions as follows.
In an initial phase of the frame Nf=.alpha., the cluster number memory 235, the cluster memory 238, the minimum distance estimator 232 and the convergence estimator 234 are initialized, and the cluster number generator 236 sequentially outputs to the cluster memory 238 a set of data D236a on addresses Ac.sub.i0 of the blocks (as initial clusters) C.sub.i0 in the memory 238.
In a synchronized manner therewith, the color data D210 and the location data D222 of representative pixels Pr.sub.i0 (=Pg.sub.i0) of the blocks C.sub.i0 are sequentially input from the frame memory 210 (FIG. 5) and the address generator 220 (FIG. 5), respectively, to the cluster memory 238, where they are written at the addresses Ac.sub.i0 designated by the address data D236a, so that each address Ac.sub.i0 has stored therein a set .phi.' of data D210 and D222 on parameter values R(.gamma.(Pr.sub.i0)/.alpha.), G(.gamma.(Pr.sub.i0)/.alpha.), B(.gamma.(Pr.sub.i0)/.alpha.), X(.gamma.(Pr.sub.i0)/.alpha.) and Y(.gamma.(Pr.sub.i0)/.alpha.)) as components of an associated representative vector Vp.sub.i0.
Then, the cluster number generator 236 outputs the address data D236a in the order of the cluster number i.sub.0 (i.sub.0 =i.sub.j =1 to n) to the cluster memory 238, where the written data D210 and D222 on parameter values {.phi.'} are sequentially read from their addresses Ac.sub.i0 in the order of the cluster number i.sub.0, to be output as a sequence of data D238 to the distance calculator 231.
In a synchronized manner therewith, the calculator 231 receives a combination of data D210 and D222 on parameter values R(.gamma.(x, y)/.alpha.), G(.gamma.(x, y)/.alpha.), B(.gamma.(x, y)/.alpha.), X(.gamma.(x, y)/.alpha.) and Y(.gamma.(x, y)/.alpha.)) of a pixel Px(.gamma.(x, y)/.alpha.).
In the calculator 231, the data 238 is sequentially processed by using the combination of data D210 and D222 for a calculation in accordance with the expression (8), to thereby obtain the n relative distances D.sub.i0 to be output as a sequence of data D231 to the minimum distance estimator 232. In synchronism therewith, a set of data D236b each representative of one cluster number i.sub.0 is sequentially output to the estimator 232.
The estimator 232 processes the data D231 together with the data D236b to hold therein the minimum distance D.sub.min-0 in combination with a number i.sub.0 of a corresponding cluster C.sub.i0, which cluster number i.sub.0 is output as a data D232 to the cluster number memory 235.
In synchronism therewith, the data D222 on the parameter values X(.gamma.(x, y)/.alpha.) and Y(.gamma.(x, y)/.alpha.)) of the pixel Px(.gamma.(x, y)/.alpha.) are input to the memory 235, where an initial cluster number (e.g. -1) set at an address Ap.sub.xy designated by the input data D222 is read and updated by writing the cluster number i.sub.0 represented by the data 232.
The read cluster number is output to the convergence estimator 234, where it is compared with the cluster number io represented by the data 232. A result R.sub.0 of the comparison is stored.
The foregoing process after the initial setting of representative parameter values of blocks C.sub.i0 is repeated for each pixel Px.
Then, the cluster number generator 236 sequentially outputs a set of data D236c representative of cluster numbers i.sub.0, in the order of the numbers i.sub.0, to the average calculator 237.
In a synchronized manner therewith, the calculator 237 receives a sequence of combinations of the color data D210 and the location data D222 of respective pixels Px(.gamma.(x, y)/.alpha.) in the picture Pc, in a preset scan order thereof. Concurrently therewith, the same data as the location data D222 in that sequence are input in the preset scan order to the cluster number memory 235, where the clusters numbers i.sub.0 stored therein are sequentially read in the same scan order.
Read cluster numbers i.sub.0 are input in the read order to the average calculator 237, where they are each respectively compared with a certain cluster number i.sub.0 represented then by one of the data 236c, to thereby detect a coincidence therebetween. Each time when the coincidence is detected, a corresponding combination of data D210 and D222 representative of a parameter vector Vp.sub.xy of a pixel Px is processed to be counted and vector-componentwise cumulated.
When the scan of the picture Pc is over for the cluster number i.sub.0, respective cumulated values are divided by a final count number N.sub.i1 in accordance with the expressions (9) to (13) to thereby determine the representative parameter values R.sub.ci1, G.sub.ci1, B.sub.ci1, X.sub.ci1 and Y.sub.ci1 of the first-order cluster C.sub.i1, which are output as a set of data D237 to the cluster memory 238.
In synchronism therewith, an address data D236a corresponding to the data D236c in concern is output from the cluster number generator 236 to the cluster memory 238, where a set of representative parameter values R(.gamma.(Pr.sub.i0)/.alpha.), G(.gamma.(Pr.sub.i0)/.alpha.), G(.gamma.(Pr.sub.i0)/.alpha.), X(.gamma.(Pr.sub.i0)/.alpha.) and Y(.gamma.(Pr.sub.i0)/.alpha.) of a corresponding block C.sub.i0 are updated by the representative parameter values R.sub.ci1, G.sub.ci1, B.sub.ci1, X.sub.ci1 and Y.sub.ci1 of the first-order cluster C.sub.i1, respectively.
Like update operation is repeated for each cluster number i.sub.0 designated by any of the data D236c, so that in the cluster memory 238 respective parameter values of all blocks C.sub.i0 are updated by corresponding parameter values of first-order clusters C.sub.i1, i.e. n Vp.sub.i0 are updated to n Vp.sub.i1.
Then, similar operations to the block representative vectors {Vp.sub.i0 } are repeated for the first-order cluster representative vectors {Vp.sub.i1 }, so that for each pixel Px the relative distances {D.sub.i1 } are calculated in accordance with the expression (15) to determine the minimum distance D.sub.min-1, whereby for each pixel Px a current cluster number i.sub.1 is provided and compared with a previous cluster number i.sub.0. Unless i.sub.1 =i.sub.0 for each pixel Px, {Vp.sub.i2 } is calculated in accordance with the expressions (16) to (20), to thereby update {Vp.sub.i1 }.
Likewise, unless i.sub.j =i.sub.j-1 for each pixel Px, {Vp.sub.i(j-1) } are updated by {Vp.sub.ij }.
If i.sub.k =i.sub.k-1 for each pixel Px, a clustering of the picture Pc is completed so the cluster numbers i.sub.k are sequentially read from the cluster number memory 235 in synchronism with the location data D222 and output as the data D230 of segmented regions RS from the clustering circuit 230.
In the motion picture processing system 200, an image segmentation of each picture Pc is performed by elementwise clustering a union set .PSI. of character parameter sets .phi.' each consisting of color data D210 and location data D222 of a pixel Px therein. For a stationary picture of each frame, therefore, a competent segmentation is expectable with a favorable preciseness.
However, the system 200 employs no information on time-dependent variations between frames for the clustering, in which no consideration is taken of any motion that an object might exhibit between pictures. As such, a number of pixels differentiated by motion might be undesirably connected with each other and that an image region with a uniform motion might be unnecessarily divided.
In other words, on the one hand, a number of pixels may be inseparably connected with each other, if their locations, colors and/or luminances are vicinal to each other. even when their motions are quite different from each other, thus resulting in an incompetent representation. More specifically, for example, when a picture contains a pair of neighboring objects resemblant in color but different in motion, if the resemblance of color is significant enough for the conventional image segmentation to cluster them together, they will be found in a single region and will not be distributed between a pair of separated regions.
On the other hand, an image region substantially uniform in motion but uneven in color and/or luminance may be unconnectably divided, causing an unnecessary local increase in number of regions, resulting in an undesirably biased segmentation due to a restriction from a total number of regions. For example, when a picture contains a sufficiently small set of pixels representing a single object colored with a pair of different colors and moving without revolution, if the difference between those colors is significant enough for the conventional image segmentation to cluster a subset of the pixel set with either color separately from another subset with the other color, the former subset will be found in a region disconnected from a region including the latter subset.
The present invention has been achieved with such problems of conventional data processing methods and systems in mind.