The present invention relates generally to an image data coding and/or decoding system which can carry out high-efficient coding of picture signals to transmit and store. More specifically, the invention relates to an image data coding system which can code and transmit picture signals to display an image on a liquid crystal display with a small screen which can be built in a wristwatch and so forth.
In the coding of image data used for a visual telephone (TV phone), a television conference and so forth, the image data efficiently compressed utilizing human's visual characteristic are used. The human's visual characteristic with respect to the distortion of a picture utilized here are as follows (see “Image Information Compression”, issued by Japanese Television Society and complied under the supervision of Hiroshi Harashima, page 12).
(1) Frequency Characteristic in Distortion Perception
Distortion varying with elapsed time and distortion with high spatial frequency are difficult to be visible to the naked eye.
(2) Relationship with Pattern of Image
Distortion is easy to be perceived at the flat portion of the image, and difficult to be visible on the contour portion of the image. However, this is the case of a still picture. In a moving picture, the distortion on the contour portion serves as an edge busyness to conversely offend the eye.
(3) Relationship between Image and Motion
When a picture is moving at a higher speed than a given speed and the user's eyes can not follow its motion, the perception sensitivity to distortion lowers.
(4) Relationship with Switching of Scene
Immediately after a scene has been switched, the distortion is not to be visible to the naked eye if the resolution considerably lowers.
(5) Relationship with Brightness of Screen
The more the screen is dark, the more the picture distortion of the same level is easy to be visible to the naked eye.
(6) Color Signal and Luminance Signal
Since distortion by color signals is more difficult to be visible to the naked eye than that by luminance signals, for example, it is possible to thin out sampled points of the color signals.
In addition, since visual acuity (spatial resolving power) on the peripheral portions of the visual field is worse than that on the central portion thereof under the influence of the distribution of visual receptor cells on retinas, it is necessary for an user to move his eyes (eye movement) in order to obtain information such as shape, structure and detail contents (see “Image Information Compression” issued by Television Society, published by Ohm, page 41). Therefore, to determine the definition of the picture in view of human's visual characteristic is dominated by the movement of human's eye serving as a subjective factor in addition to the resolution of the picture serving as an objective factor.
On the other hand, when a human looks at an object, if the object is small, it is possible to recognize the whole shape and so forth of the object by staring a specific range around a point. However, if the object is large, it is necessary to closely observe a wide range including a large number of points to recognize the whole shape and so forth of the object. When he watches a television receiver, if its screen is large, a large number of closely observed points are distributed in a given range by frequently moving his eyes, but if the screen is small, the range wherein the closely observed points are distributed does not so extend.
It is disclosed in “Estimation Technique of Image Quality and Tone Quality” (edited Television Society and published by Shokodo, page 118) that since the display screen in a high quality television system which rapidly approaches to implementation in recent years is greater than those of current television systems, the closely observed points distributing ranges in these systems are different. FIG. 5.22 on the same page of this paper shows the measured result of proportion of the closely observed points distributing range to the area of the screen when observing a high quality television system and a current television system on a standard observation condition using a program of the same content. This figure is expressed by approximating to an ellipse with three times as large as the standard deviation assuming that the closely observed points lie on a normal distribution in horizontal and vertical directions when the center of the screen is the origin. It is also shown the experimental results that the proportion of the distributing range of the closely observed points to the area of the screen is about 60% in the current television systems, but it reaches about 80% in the high quality television system. That is, as the size of the screen decreases, the proportion of the distributing range of the closely observed points decreases and the range concentrates on the center of the screen. Therefore, since the spatial resolving power of the visual sensation on the peripheral portion of the screen is inferior, the information compression can be efficiently carried out by lowering the spatial resolution or by weighting the assignment of the distortion in preprocessing.
By the way, as a method for efficiently compressing the measure of information using the difference between the visual characteristic at the central portion of the visual field (central vision) and the visual characteristic at the peripheral portion of the visual field (peripheral vision), there is a method disclosed in, for example, “Visual Pattern Image Sequence Coding” (August, 1993, IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL.3, NO.4, pp-291-301). In the technique disclosed in this literature, a function relating to the position of radius r from the central point of the screen is derived, and the resolution on the peripheral portion of the screen is lowered using this function.
In addition, as a method for performing the information compression by changing the distribution of the assigned code amount in a visually important region and an unimportant region, there are two methods as follows.
One of the methods has been proposed as applied to a video telephone (Japanese Patent Application Laid-open No. 1-80185 (1989) “Moving Picture Coding Method”). In this method, on the assumption that the closely observed points are concentrated on the face of the opposite party for the telephone conversation, the face region is detected to assign many code amount on the detected face region.
Another method is also applied to a video telephone similar to the aforementioned proposal (Japanese Patent Application Laid-open No. 5-95541 (1993)). Similar to the aforementioned proposal, by detecting the face region to apply a spatial-temporal filtering to a region other than the face, the code amount produced in this region other than the face is decreased, and the code amount assigned in the face region is increased.
Both of these conventional methods pay attention to human's visual characteristic, and provide a natural picture to a person which visually recognizes a reproduced picture, by changing the coded data amount so that the coding data amount in the region in which the closely observed points are concentrated in the distribution of closely observed points, is different from the coding data amount in the region in which the closely observed points are not so concentrated.
As mentioned above, in both of the conventional image data coding methods, the information compression has been efficiently performed using human's visual characteristic by restraining the code amount produced in a visually unimportant region and by increasing the code amount assigned to a visually important region. However, both of the techniques disclosed in the aforementioned two publications only classify the regions in the screen on the basis of the degree of concentration of the distribution of closely observed points, to vary the code amount assigned to each of the regions, and these techniques do not consider human's visual characteristic that the distribution of closely observed points is different by the size (area) of the screen as described in the aforementioned literature “Estimation Technique of Image Quality and Tone Quality”.
In addition, there are problems in that when the image data are transmitted via a radio transmitting channel having a narrower bandwidth than that of a wire transmitting channel, the resolution of the reproduced picture is generally decreased by the limit of the transmitted amount due to the narrow bandwidth, so that the size (area) of the screen is necessarily decreased.
By the way, in conventional image data coding systems, for example, in moving picture data coding systems defined by MPEG, after inputted picture signals are divided into square blocks of 8×8 pixels as shown in FIG. 55, the two-dimensional discrete cosine transform (DCT) is performed for coding.
On the other hand, in “Applying Mid-level Vision Techniques for Video Data Compression and Manipulation” (M.I.T. Media Lab. Tech. Report No.263, February 1994), which will be hereinafter referred to as “Literature 1”, J. Y. Wang et.al. disclose that picture signals are divided into a background and a subject (which will be hereinafter referred to as a “content”) for coding, as shown in FIG. 56. Thus, in order to code the background and the content separately, a map signal called a alpha map indicative of the shape of the content and its position in a screen is prepared. In this coding method, it is possible to vary the picture quality content by content and to reproduce only a specific content. However, as shown in FIG. 55, in a case where the interior of a screen is divided into square blocks for coding, it is required to separately process the blocks containing the boundary portion of the content, i.e. the edge blocks between the inside and outside of the content, as shown in FIG. 57.
It has been also proposed a method for coding picture signals after dividing the interior of a screen into blocks of optional shapes so as to adapt to statistical characteristic in the screen and to the shape of a content. Such a method for performing the orthogonal transform of an optional shape is disclosed in “Examination of Variable Block Size Transform Coding of Image Using DCT” (Matsuda et.al., Singaku-Shuki-Daizen D-146, 1992), which will be hereinafter referred to as “Literature 2”. In this specification, this transform method will be hereinafter referred to as “AS-DCT”. In AS-DCT, first, one-dimensional DCT is performed in a horizontal (or vertical) direction as shown in FIG. 58(a), and then, after it is rearranged in order of the low of the DCT coefficient as shown in FIG. 58(b), the one-dimensional DCT is performed in a vertical (or horizontal) direction.
Also, in “Estimation of Performance of Variable Block Shape Transform Coding of Image Using DCT” (Matsuda et.al., PCSJ92, 7-10, 1992), which will be hereinafter referred to as “Literature 3”, the coding efficiency has been improved by selecting the order of higher coding efficiency as a result of practical coding, as the order of the transform in the horizontal and vertical directions.
Further, “Image Data Coding Techniques—DCT and Its International standard—” written by K. R. Rao and P. Yip and translated by Hiroshi Yasuda and Hiroshi Fujiwara (7.3, pp164-165, Ohm), which will be hereinafter referred to as “Literature 4”, discloses a method for performing the resolution transform of picture signals using the two-dimensional DCT. That is, it is possible to transform the resolution by taking out a part of the DCT coefficient derived by the two-dimensional DCT to inversely transform by the DCT of a different degree, as shown in FIG. 59.
In a picture system such as a graphic display, in order to actualize various image effects, it is desired to perform the resolution transform of a content in a screen for the reduction and enlargement thereof. Since there are contents of various shapes, it is required to perform the resolution transform of contents of optional shapes. However, for example, in the AS-DCT which is a method for performing the orthogonal transform of optional shapes disclosed in the aforementioned Literature 2, it is impossible to actualize the resolution transform in a case where a block to be transformed is an edge block, i.e. a block containing the boundary portion of a content.
In addition, there are problems in that the coding efficiency to an edge block is low in the AS-DCT and other methods for performing the orthogonal transform of optional shapes.