The present invention relates to an apparatus which discriminates a moving region that contains motion from a stationary region that does not contain motion in a video signal such as a television signal, and more particularly, to such apparatus employed in an interframe predictive coding apparatus.
The apparatus for discriminating the moving region from the stationary region is used, for example, for a predictive encoding apparatus for a video signal. That is, in the predictive encoding apparatus, the input video signal is divided into the moving region containing motion and the stationary region not containing motion. Then, different encoding methods or different quantization characteristics are applied to the moving region and the stationary region, respectively, in order to suppress as much as possible of the video data to be transmitted to a receiving side, subject to the condition that the decoded video signal on the receiving side must still be adequate for practical use.
In a conventional method for discriminating the moving region from the stationary region in the video signal, a frame-to-frames amplitude (hereinafter frame difference) differences with respect to each picture elements in a picture screen is calculated, and the moving region is determined by gathering picture elements for which the absolute values of the frame differences are greater than a threshold value.
Or, in another conventional method, when a significant picture element, which has an absolute value of the frame difference greater than a threshold value, exists close to another significant picture element within a predetermined distance on the same scanning line, all picture elements within the predetermined distance are regarded as the significant picture elements and, then, the moving portion is determined by a set of such significant picture elements. Such method is disclosed in the Bell System Technical Journal, "Transmitting Television as clusters of Frame to Frame Differences", Vol. 50, No. 6, pp. 1889-1919, July-August, 1971.
According to these conventional methods, however, a picture element in the stationary region is frequently mis-detected (detected erroneously) as being a significant picture element when a brightness level in the video signal changes due to jittering in sampling pulses, or when a large amount of noise is contained in the video signal. Therefore, such mis-detection makes it difficult to correctly separate the moving region from the stationary region. The noise contained in the video signal can be considered as having an amplitude that changes nearly in a random fashion. According to the conventional methods in which a signal level of each picture element is compared with a threshold value, a picture element having a large amplitude of noise is inevitably mis-detected to be a significant picture element. Conversely, if the threshold value is increased for preventing mis-detection, a picture element having a small amplitude change can not be detected as a significant picture element.
It has also been attempted to break down the picture into blocks and detect whether each given block is a moving region or a stationary region. Each block is defined to contain a plurality of picture elements in the horizontal direction and a plurality of lines in the vertical direction. In this case, the frame differences of all picture elements in each of the blocks may be added, and the added result is compared with a predetermined threshold value. When the added result is greater than the threshold value, the block is determined to be the moving region. Alternatively, the number of significant elements in each block may be counted and, then, compared with a threshold number. In this case, when the number of the significant picture elements in the block is greater than the threshold, the block is determined to be a moving region.
However, even when the moving region and the stationary region are detected with respect to blocks, particular blocks are often mis-detected due to noise or jitter. This causes problems in predictive encoding apparatus for a video signal, since in general, particular sampling and sub-sampling methods as well as different quantization characteristics are selectively employed in the moving region and the stationary region so as to suppress as small an amount of video data as possible. Therefore, if the moving and stationary regions are mis-detected, a switching operation between different sampling methods or between different quantization characteristics may occur frequently, and the visual characteristics of the picture will be considerably degraded where such switching operation occurs.