The present invention relates to method and apparatus for detecting predetermined attributes (e.g., texture) of data signals (e.g., video and audio signals), and more specifically to a method and apparatus for detecting feature patterns of characters or graphics as the predetermined attributes of video signals (i.e., data signals).
Further, the present invention relates to a method and apparatus for an image segmentation to divide a picture into a plurality of regions to recognize and process the divided regions on the basis of video signal levels, and further to easily represent a boundary of the divided regions.
First, the prior art technique related to the method and apparatus for detecting data signal attributes will be described hereinbelow.
(1) As a novel compression coding technique for a grayscale image, fractal coding technique has been so far studied (for instance, as disclosed by Document 1: "Fractal Image Coding: A Review", A. E. Jacquin, Proceedings of the IEEE, VOL. 81, No.10, October, 1993). In this technique, an original square picture to be coded is divided into a plurality of blocks, as shown in FIG. 1, and a coder decides a similar region or regions for each block on the basis of the other blocks in the same picture. Here, "similar" implies the relationship between the blocks in which the picture patterns can be roughly equalized to each other, by a linear reduction transform in the picture, a simple pixel arrangement transform (such as revolution in units of 90 degrees and mirror image reversal, etc.), and a liner transform of pixel values. The above-mentioned linear transform is referred to as an affine transform. Here, in the case of digital video signals, since a picture is constructed by a number of discrete pixels, the reduction transform in a picture is the same as the sampling of pixels.
Now, as shown in FIG. 2, the assumption is made that there exists a similar region 152 whose vertical and horizontal sizes are twice as large as those of a block 151, and the block 151 is composed of 4.times.4 pixels and the similar region 152 is composed of 8.times.8 pixels. Here, when the pixel arrangement is not transformed, for instance, a pixel 153 located on the upper left side of the block 151 corresponds to a white point 155 of the similar region 152. However, there exists no pixel at this position 155. In this case, therefore, the value of the pixel 153 is determined by an average value of four pixels 154 surrounding the white point 155. As described above, the reduction transform can be obtained by sampling 4.times.4 pixel data from the 8.times.8 pixel data.
Further, the fractal coder outputs (a) the position and the sizes of a similar region for each block, (b) the transform method of pixel arrangement, and (c) the data required for pixel value transform method as code data. The outputted coded data are transmitted or stored. In the reduction transform method in a picture, since the coded output data can be decided unequivocally in accordance with the size of the similar region and the size of the previously determined blocks, it is unnecessary to transmit and store the code data.
FIG. 4 is a block diagram showing a prior art fractal coder. Original picture data 301 are stored in a frame memory 302. On the basis of a signal 304 for designating the linear transform applied from a control section 303, picture data 305 in a designated region are read from the frame memory 302, and then inputted to a size reduction transform section 306. The size reduction transform section 306 reduces the picture data 305 in the region to the same size of the block (i.e., the same number of pixels of the block). The reduced data 307 are transmitted to the transform section 308. The transform section 308 executes the aforementioned-mentioned linear transforms other than the size reduction transform, and the transformed data 309 are inputted to a difference section 311. On the other hand, the block data 310 are inputted from the frame memory 302 to the difference section 311. The difference section 311 calculates a difference between the block data 310 and the transformed data 309, and transmits a difference 312 to a control section 303. As described above, the control section 303 designates several sorts of linear transforms, and decides the linear transform of the minimum difference 312 as the similarity transform of the block. The decided data are outputted as codes 313 to the outside.
FIG. 3 is a block diagram showing a prior art fractal decoder for decoding an original picture on the basis of the codes transmitted from the fractal coder as described above. In the drawing, codes 501 are inputted to a transform section 502. Further, an original picture is previously stored in a frame memory 503. Any images can be used as the original picture. In accordance with the data included in the codes 501, the similar region data 504 for each block are read from the frame memory 502. The similar region data 504 are processed in accordance with the data included in the codes 501. The processing executed by the transform section 501 is intra-picture reduction transform, pixel arrangement transform, and pixel value transform. The transformed data 505 are transmitted to the frame memory 503, and overwritten on the corresponding blocks of the frame memory 503. The above-mentioned rewriting of pixel values are executed for all the blocks, respectively to obtain a first replacement picture. After that, on the basis of the first replacement picture, the similar replacement as with the case of the first replacement is executed again to obtain a second replacement picture. After the above-mentioned replacements have been iterated several times, since the picture stored in the frame memory 503 can be converged to a picture roughly equal to the original picture, the converged picture is outputted to the outside as a reconstructed image 506. The reconstructed image 506 will not change any more even if replaced repeatedly. In other words, the following expression can be obtained EQU F(A)=A
where A denotes a reconstructed image and F denotes a replacement transform.
The fact that an image is reconstructed on the basis of the fractal coding/decoding is to obtain an image A which can satisfy the above expression. In the case where the transform F is the reduction transform, the conventional method utilizes such a nature that any image can approach an image A gradually after the replacement transforms F have been iterated.
FIG. 5A is a block diagram showing a prior art fractal decoder. First, any desired initial pictures can be stored in a picture memory 401. In general, this initial picture is different from an original picture. Input codes 402 are read from a storage medium 403, for instance. Further, similarity region data 405 of the first block are read from the picture memory 401. The data 405 read from the picture memory 401 are transformed by a position transform section 404 in accordance with the transform designated by the position transform codes 406 of the first block, and then transmitted to a pixel value transform section 407. In the pixel value transform section 407, the transform executed is designated by the pixel value transform codes 408 of the first block. The transformed data 409 are returned to the picture memory 401. In the picture memory 401, the first block pixels are replaced with the transformed data. The pixel replacement by the similarity transform as described above are executed for the second block and after in the same way, to obtain the first transformed picture. The obtained picture is different from the original picture in general.
After that, the similar replacement transforms for each block are executed by use of the first transformed picture stored in the picture memory 401, to obtain the second transformed picture. By repeating the above-mentioned replacement transforms, the picture in the picture memory 401 is converged gradually to a picture roughly equal to the original picture. The converged picture is outputted as the reproduced picture, and then displayed on a display unit 410, for instance.
On the other hand, as the fractal codes represent a geometrical structure of a picture as codes, it is possible to consider that the attribute at each portion of the video signals can be discriminated, by use of data of the block and the similar region included in the codes. The discrimination as to which region the pixels in a picture belong to is considered to be effective for division of the picture. The division of the picture region can be applied to various fields. For instance, after a picture including a plurality of objects has been divided into a plurality of regions for each object, these regions can be synthesized again in any desired arrangement or the sorts of the objects can be recognized on the basis of the shapes of the regions. Further, in the compression coding of the video signals, it is possible to allocate many bits to only important regions from the visual standpoint, to improve the subjective picture quality. Therefore, the technique of the image segmentation is important as the basic technique for these applications. However, there have been not yet proposed any method of detecting the attributes of the data signals (e.g., original video signals) from the fractal codes and further the method of the image segmentation on the basis of the detected attributes.
Further, in the prior art method as described with reference to FIG. 3, since the frame memory is required to store pixel values for each picture in the fractal decoder, there exists a problem in that the apparatus scale and cost thereof both increase. In addition, when only a part of an image is required to be reconstructed, in the prior art method, after the entire image has been once reconstructed, any desired part is cut away from the entire image, while discarding the other remaining portions. In this method, however, wasteful calculations are inevitably executed for the reconstruction of the unnecessary portion, thus it being not preferable from the standpoint of processing efficiency.
Here, in the case where there exists a picture data base (in which a great number of pictures are stored in the form of fractal coded data) and the stored data base can be retrieved, this will be taken into account.
When any desired picture is found, in general, the picture codes are read from the data base to reproduce the original picture, and the read picture codes are displayed on the display unit for each picture. In this retrieval, however, in many cases it is sufficient when a simple picture indicative of a rough picture size and luminance value or a picture impression can be seen, without necessarily seeing its own original picture. Therefore, since the simple picture can be formed by a smaller quantity of calculations and a smaller circuit scale, as compared with the original picture, it is possible to save the retrieval time and cost. In particular, when a simple binary picture can be formed, it is possible to display the picture, without use of a high gradation display of higher cost.
Further, when an original picture is processed (e.g., morphing or deforming), in general an original picture is first reproduced on the basis of the compressed data; the compressed data are processed in accordance with the conventional method on the basis of light and dark picture levels; and the processed picture is compressed again for transmission or storage. However, when the compression and the reproduction are repeated many times as described above, there exists a problem in that the picture becomes obscure or distorted, with the result that the picture quality deteriorates gradually. Further, since the coder and the decoder must be both required, there exists a problem in that the hardware scale inevitably increases.
(1) As described above, since the fractal codes include the geometrical structure of a picture, when the attributes of the respective portions of the picture and other data signals can be discriminated by use of the data of the block and the similar region related to the codes, the obtained attributes seem very useful to divide the picture region. However, there have been not yet so far proposed the method of detecting the attributes of data signals (e.g., the original video signals) from the fractal codes and further the method of dividing the region on the basis of the detected attributes.
Further, in the prior art technique, since the frame memory is required to store the pixel values for one picture in the fractal decoder, there exists a problem in that the system scale and cost thereof increases. Further, when only a part of a picture is required to be reproduced, after the entire picture has been once reproduced, since a desired part is cut way while discarding the other remaining parts, there arises another problem in that the wasteful calculations are executed for unnecessary parts, with the result that the processing efficiency is not high.
(2) In the prior art picture forming apparatus, even in the case where a simple picture is sufficient (as with the case of a data retrieval from a data base in which a great number of pictures are stored in the form of codes), since an original picture has been so formed, many reproduction calculations are required to be executed, with the result that there arises a problem in that the circuit scale increases and a high-costly multi-gradation display unit must be prepared to display a picture.
In addition, when the picture is processed on the light and dark level, such problems arise that the picture quality deteriorates gradually during the compression reproduction and further the coder and decoder both must be prepared.
On the other hand, there exists the following prior art technique for extracting the feature pattern.
Conventionally, the technique of recognizing characters (e.g., hand-written letters) has been important and widely used in practice in the fields of mail sorting or the hand-written letter input. In the current technique, however, since it is difficult to cut off the characters as a pattern in a unit of one character, the cutting-off of the characters is supplemented by recognizing a meaning indicated by the character (knowledge information such as a radical). Here, if the character size can be detected before the processing in which the knowledge information is used, since the succeeding recognition can be executed more easily, there exists a need of developing a technique for detecting the character size.
Further, at the pre-processing of recognition of a picture in which a plurality of textures are mixed (fine patterns are distributed uniformly), there exists a need of detecting the size of each texture. Or else, there exists another need such that a pitch period of an audio signal is required to be detected to facilitate processing of the succeeding recognition.
It has been known that the fractal dimensions are used to detect the feature pattern size of these signals. Here, the fractal dimensions can represent a complexity of signals (e.g., video signals) by an identifier value (See "Fractal Mathematical Principle", Applied Mathematics I, by Yamaguchi, Hata, and Kigami, IWANAMI COURSE, April, 1993). Further, there are some methods of obtaining the fractal dimensions. Here, however, a Blanket-Covering method (one of the fractal dimension obtaining methods) will be explained hereinbelow (See, T. Peli, V. Tom, B. Lee, "Multi-Scale Fractal and Correlation Signatures for Image Screening and Natural Clutter Suppression", SPIE vol, 1199, Visual Commun. and Image Processing IV, 1989).
Now, an image curved surface (three dimensional) composed of a set of dots (each of whose length indicates an intensity of luminance value (an integer value) at each pixel) and a series {.epsilon..sub.k } (k=0, 1, . . . ) of scale .epsilon..sub.k (&gt;0) are considered for the respective pixels in the two-dimensional picture plane in a direction perpendicular to the picture. Further, the above-mentioned image curve is covered with a blanket with a width .epsilon..sub.k in a certain scale .epsilon..sub.k on both the upper and lower sides thereof. Here, if the upper surface of the blanket over the pixel (i, j) is denoted by u.sub.i,j (.epsilon..sub.k); if the lower surface of the blanket under the pixel (i, j) is denoted by b.sub.i,j (.epsilon..sub.k) ; and if the luminance value (an integer value ) at the pixel (i, j) is denoted by g.sub.i,j since u.sub.i,j (.epsilon..sub.0)=b.sub.i,j (.epsilon..sub.0)=g.sub.i,j, the upper surface u.sub.i,j (.epsilon..sub.k) of the blanket over the pixel (i, j) and the lower surface b.sub.i,j (.epsilon..sub.k) of the blanket under the pixel (i, j) can be obtained gradually as follows: ##EQU1##
Here, since .epsilon..sub.0 =0, the change B (.epsilon..sub.k) of the bright surface of the blanket can be obtained as ##EQU2##
This B (.epsilon..sub.k) is referred to as a measure relative to the scale (.epsilon..sub.k). In other words, the scale corresponds to a unit for obtaining the measure B (.epsilon..sub.k). As shown in FIG. 5B, when the logarithm of the measures are taken on the ordinate and the logarithm of .epsilon. is taken on the ordinate, there exists a case where a straight line having a gradient a can be obtained. Here, (D=2-a) obtained on the basis of the gradient of the straight line is referred to as fractal dimension.
In general, since the measure is a rate corresponding to a volume or area determined unequivocally relative to the scale, the linear relationship obtained when the measure and the scale are both taken in logarithmic scale is characterized by the fractal dimension.
Conventionally, the method of extracting the feature region of a picture by use of the fractal dimension as described above has been studied. For instance, in Japanese Published Unexamined (Kokai) Patent Application No. 4-17068, blocks of an object picture is divided gradually into smaller blocks, until the fractal dimension will not change according to the size of the divided blocks. In the conventional method using the fractal dimension as described above, however, it has been impossible to extract the regions without block division.
Further, Japanese Published Unexamined (Kokai) Patent Application No. 3-269782 discloses the method of extracting the character region from the character picture by use of the fractal dimension. In this method, the fractal dimension is obtained at all the pixels in both vertical and horizontal directions of a picture, and the region indicative of the character region is discriminated on the basis of the fractal dimension pixel by pixel. However, since the region is divided in unit of pixel, it takes much time, and further it has been impossible to easily specify the pattern size by grasping the character roughly.
As described above, in the prior art methods, when the size of the feature pattern of the data signals is required to be detected, there exists a problem in that the processing is very troublesome and thereby complicated, with the result that it has been impossible to easily detect the size of the pattern.
Finally, the prior art technique related to the region division of data signals will be explained hereinbelow by taking video signals as a practical example of data signals.
The technique for dividing a picture into partial regions (in which the local feature of video data (e.g., luminance value, color, etc.) is uniform) is referred to as region division. Conventionally, this region division of a picture has been important technique, and widely applied to various fields such as video signal coding, video signal processing, character region recognition, etc. However, the region division has been mainly used when video signals on a plane picture are processed.
Further as a technique related to this region division, there exists a technique of representing the region boundaries (referred to as region boundary representation, hereinafter). When data obtained as a result of region division are stored, transmitted through a communication path, or utilized as coding, this region boundary representation technique is required. Therefore, an important problem is how to represent the region boundary by use of the smallest possible amount of data, which has been so far studied.
First, the prior art technique of region division will be described hereinbelow.
The technique for dividing a picture into partial regions (at which the local feature of video data (e.g., luminance value, color, etc.) is uniform) is referred to as region division. Conventionally, this region division of a picture has been important technique, and widely applied to various fields such as video signal coding, video signal processing, character region recognition, etc. However, the region division has been mainly used when video signals on a plane picture are processed.
For instance, as a simple region expansion method, the regions are divided on the basis of the luminance values between the adjacent pixels (See IMAGE ANALYSIS HANDBOOK, Editors: Takagi, Shimoda, Tokyo University Publishers' Assoc. October, 1991). With reference to FIG. 6, the processing flow is as follows: the luminance value at a non-classified pixel is compared with those of the other adjacent pixels. When a difference between the two is less than a threshold value .theta., two pixels are synthesized (or integrated) and a label is attached thereto. The same operation is repeated until the region cannot be synthesized. This method is the most basic and simple method.
Although the region division performance thereof is slightly lower than that of the other complicated method, since the threshold value .theta. is a clear parameter, this method is easy to use.
Further, the region division methods can be classified into an integration method, a separation and/or integration method and pixel coupling method by changing the region forming process. Further, there exists an intermediate method by which the feature space is further classified and after that the region is divided. In these methods, however, where the feature rate to the region has an ambiguity due to the picture uncleanness or noise, it has been impossible to execute the region division at a sufficiently high precision. To overcome this problem, the regions have been so far divided in combination with the relaxation method for removing the ambiguity. However, it takes much time to set many parameters for some pictures, so that it has been difficult to divide the regions of complicated texture or the regions of less luminance difference.
Further, there exists a need of utilization of the picture region division such that any region is required to be extracted from the picture. However, when the boundary of the region requested to extract is of complicated shape, even if the region can be grasped roughly, it has been difficult to extract a region of complicated shape accurately.
Further, in the method and apparatus for compressing and reproducing a picture by dividing the picture into regions and then coding the divided regions, since the picture regions are transmitted as the additional data, a huge data amount must be processed. Further, as shown in FIG. 7, there exists a method such that a shape I.sub.s of a region is approximated by a simple shape I.sub.A to reduce the amount of data. Further, various methods of transmitting the region shapes have been proposed such that: the regions are divided into several blocks and each block is approximated by segments (See "Image Coding by Utilization of Contour Fractal Characteristics" by Suzuki, Sumiyoshi, Miyauchi; Proceedings of TV Society, Vol. 48, No. 1, pp. 69-77, 1994) or the time-shifted picture regions already obtained are substituted for the regions of an original picture (See "Study of Method of Compensating for Block Size Movement with reference to Preceding Frame" by Kida, Kawashima, Tominaga, All-Japan Meeting of Communications Society, D-179, March 1993).
Further, when any desired region is extracted from a picture in a system (for retrieving and processing pictures after communications and storage as a data base) in accordance with the prior art picture compression method, it has been necessary to reproduce the picture from the compressed data and further to re-compress the reproduced data after processing. In addition, when these processing are repeated, there exists a problem in that the picture quality inevitably deteriorates.
Further, in the method of executing the region dividing on the basis of luminance, a method of executing the region division recurrently by obtaining adaptive threshold value for division has been proposed (See "Recurrent decision method of density threshold and edge detecting threshold on the basis of match evaluation between contour and edge, by Goto, Toriu, Proceedings of Electron Information Communications D-11, Vol. J77-D-II, No. 9, pp. 1727-1734, September, 1994). In these methods, however, there exists a problem in that it takes much time to set many parameters according to a picture or a region of complicated texture is divided too finely, or a region of less difference in luminance cannot be well divided, etc.
The prior art technique related to the region boundary representation will be explained herein below.
The well known method of representing a region boundary is chain coding method (See IMAGE ANALYSIS HANDBOOK, Editors: Takagi, Shimoda, Tokyo University Publishers' Assoc. October, 1991). In this method, the directions that the boundary extends from a starting point are described. This method is effective as the method of describing the contour of a region picture. However, in order to express the region boundary in detail at one-pixel precision, there exists such a problem in that several bits are required for each bit as the data for representing the extending direction of the region boundary.
Further, as one of the fractal coding, there exists Recurrent IFS coding method for coding line drawings after the region boundary has been extracted" (See M. F. Barnsley, A. E. Jacquin, "Application of recurrent iterates function systems to image", SPIE VOL. 1001, Visual Communications and Image Processing, '88, pp. 122-131). In this method, as shown in FIG. 10, a region boundary 8 is divided by several segments 9 corresponding to the afore-mentioned fractal coding block, and the transform parameters 11 for the similar segments 10 corresponding to a similar block are obtained. In this method, however, there exist problems in that the region boundary 8 is detected by use of another method and further the division of the segment 9 and the transform parameters 11 for the similar segments 10 must be both obtained manually, that is, the coding is not automatized. In addition, it is necessary to transmit all the broken points 12 (double circles in FIG. 10) of the firstly divided segment. Further, in order to express a more detailed region boundary, the segment must be divided more finely for more accurate retrieval of the similar segments, thus causing drawbacks such that the number of the broken points 12 increases and thereby the data representative of positions inevitably increases.