1. Field of the Invention
The present invention relates to a motion vector detection apparatus and a predictive coding system for compensating for movement when using such detection apparatus.
1. Description of the Related Art
FIG. 37 shows a predictive coding system for compensating for movement between frames, using DCT (discrete cosine transform) for compressing the data from a video signal. A digital video signal DV is supplied to a subtractor 2, which subtracts the digital video signal DV from a predictive video signal PDV and provides a predictive error signal PER. The predictive error signal PER is typically small since the video signals DV between frames are generally related. The predictive error signal PER from the subtractor 2 is supplied to a DCT circuit 3, which performs a discrete cosine transform process for the data for each coding block.
A transformed coefficient data TCD provided from the DCT circuit 3 is supplied to a quantizer 4, which performs quantization. In the quantizer 4, the step width for quantization is small when the data TCD is near 0 and the farther it moves from 0, the greater the step width for quantization becomes. This is due to the fact that a rough quantization is not visibly noticeable with a radical change. The data is compressed using such non-linear quantization. Quantized data from the quantizer 4 is encoded by a variable length encoder 5, which outputs code data DT.
The quantized data from the quantizer 4 is reversely quantized by a reverse quantizer 6. Converted coefficient data TCDp (approximately equal to the converted coefficient data TCD) from the reverse quantizer 6, is supplied to a reverse DCT circuit 7, which performs a reverse discrete cosine transform process. A predictive error signal PERp outputted from the reverse DCT circuit 7 is supplied to an adder 8, which adds it to the predictive video signal PDV. A video signal outputted from the adder 8 is delayed by one frame through the use of a frame memory 9, and a motion compensation circuit 10 provides the predictive video signal PDV, compensating for any movement present.
A motion vector detection circuit 11 receives the digital video signal DV and the video signal of one frame from the previous frame which is outputted from the frame memory 9, and detects a motion vector in the typical manner known to those skilled in this art. With the motion compensation circuit 10 described above, the amount of the compensation required is determined based on information from the motion vector which is provided through the detection of the motion vector detection circuit 11.
When the motion vector detection circuit 11 and the other apparatus in the predictive coding system for compensating for movement between frames in FIG. 37 use a matching method in order to detect a motion vector, a reference frame is divided into equal blocks (reference blocks), the block best matching each block is searched for among previous or subsequent frames (search frames), and the distance between such matching blocks is determined as a motion vector, as shown in FIG. 38.
Typically, in order to search for the best matching block, all pixels within two blocks are subtracted from each other, the sum of the absolute values or squares is calculated, and the position with the smallest value is determined to be the motion vector. In the case of a block with 16 pixels multiplied by 16 lines, 256 subtraction and absolutization calculations and 255 addition calculations are needed in order to search for one point. If the search range is set from -8 to +7 along the vertical and horizontal directions, 256 points must be searched. This requires a huge number of calculations and requires an enormous amount of hardware.
Currently, one method for reducing the scale of circuitry has been proposed. A frame block having (m.times.n) pixels, as shown in FIG. 40A, is divided into minor blocks, each of which has (a.times.b) pixels as indicated in FIG. 40B, and the integration of all pixels in each minor block gives a representative value. Such conversion for reducing the number of pixels in each block is termed "feature extraction". With feature extraction, (m.times.n) pixels of matching elements can be reduced to (n.times.m)/(a.times.b), thereby reducing the scale of circuitry. In FIG. 40C, the hatched area corresponds to a two-dimensional frequency component after feature extraction, and the area defined by the dashed lines corresponds to a two-dimensional frequency component prior to the feature extraction. The horizontal axis represents a horizontal frequency fhcpw!, and the vertical axis represents a vertical frequency fvcph!. Further, fsv represents the number of lines in one frame (for example, 525 in the NTSC system), and fsh represents the number of samples in one line (for example, 910 in the NTSC system in which the sampling frequency is 4 fsc (fsc is a color sub-carrier frequency)).
While FIG. 40 shows the detection of the frame motion vector, in order to detect a field motion vector, a field block having (m.times.n/2) pixels in FIG. 41A is divided into minor blocks each of which has (a.times.b/2) pixels in each odd or even field, and the integration of all pixels in each minor block gives a representative value. According to such feature extraction, (m.times.n/2) pixels of matching elements can be reduced to (n.times.m)/(a.times.b), thereby reducing the scale of circuitry. In FIG. 41C, the hatched area corresponds to a two-dimensional frequency component after feature extraction, and the area defined by the dashed lines corresponds to a two-dimensional frequency component prior to feature extraction.
FIG. 39 illustrates a frame/field motion vector detection apparatus in which the scale of circuitry is reduced by using feature extraction. Reference numeral 20 denotes a frame memory which stores pixel signals in a reference frame. The pixel signals in the reference frame are read from the frame memory 20, controlled by a control circuit 50, and supplied to a feature extraction circuit 21. The feature extraction circuit 21 performs feature extraction (FIG. 40B), whereby the frame block is divided into minor blocks and the integration of all elements in each minor block gives a representative value.
Reference numeral 30 denotes a frame memory which stores pixel signals for the search frame. The pixel signals for the search frame blocks are read from the frame memory 30, controlled by the control circuit 50, and supplied to a feature extraction circuit 31. The feature extraction circuit 31 performs feature extraction (FIG. 40B), whereby the frame block is divided into minor blocks and the integration of all elements in each minor block gives a representative value. The representative value is supplied as a matching element to a matching circuit 40FL.
The matching circuit 40FL measures the differences between each of the feature-extracted elements of the reference frame blocks and the search frame blocks, and calculates the absolute value or square of each difference, which value is supplied as an estimated value to an estimation circuit 41FL. Although not described above, the pixel signals for the search frame blocks corresponding to the specified reference frame block are read from the frame memory 30, controlled by the control circuit 50, and are feature-extracted by the feature extraction circuit 31. Thus, the estimated values corresponding to the respective search frame blocks, are supplied from the matching circuit 40FL to the estimation circuit 41FL.
The estimation circuit 41FL selects the search frame block which best matches with the specified reference frame block among the estimated values corresponding to all search frame blocks, and outputs the position of the best matching search frame block relative to the reference frame block as a frame motion vector. Such detection of the frame motion vector is performed for a number of reference frame blocks which are read from the frame memory 20.
The pixel signals for the reference odd-field blocks are read from the frame memory 20, controlled by the control circuit 50, and supplied to a feature extraction circuit 22. The feature extraction circuit 21 performs feature extraction (FIG. 41B), whereby the odd-field block is divided into minor blocks and the integration of all elements in each minor block gives a representative value. The representative value is supplied as a matching element to a matching circuit 40F0.
The pixel signals for the search odd-field blocks are read from the frame memory 30, controlled by the control circuit 50, and supplied to a feature extraction circuit 32. The feature extraction circuit 32 performs feature extraction (FIG. 41B), whereby the odd-field block is divided into minor blocks and the integration of all elements in each minor block gives a representative value. The representative value is supplied as a matching element to a matching circuit 40FO.
The matching circuit 40FO measures the differences between each of the feature-extracted elements of the reference odd-field blocks and the search odd-field blocks, and calculates the absolute value or square of each difference, which value is supplied as an estimated value to an estimation circuit 41FO. Although not described above, the pixel signals for the search odd-field blocks corresponding to the specified reference odd-field block are read from the frame memory 30, controlled by the control circuit 50, and are feature-extracted by the feature extraction circuit 32. Thus, the estimated values corresponding the respective search odd-field blocks are supplied from the matching circuit 40FO to the estimation circuit 41FO.
The estimation circuit 41FO selects the search odd-field block which best matches the specified reference odd-field block among the estimated values corresponding to all search odd-field blocks, and outputs the position of the best matching search odd-field block relative to the reference odd-field block as a field motion vector. Such detection of the odd-field motion vector is performed for a number of the reference odd-field blocks which are read from the frame memory 20.
Reference numerals 23 and 33 denote feature extraction circuits for even-field blocks, reference numeral 40FE denotes a matching circuit, and reference numeral 41FE denotes an estimation circuit. These circuits correspond to the feature extraction circuits 22 and 32, the matching circuit 40FO, and the estimation circuit 41FO for the odd field blocks, respectively. The estimation circuit 41FE outputs even-field motion vectors which correspond to each of a number of reference even-field blocks read from the frame memory 20.
An example of the feature extraction circuits 21 and 31 for the frame blocks is described below. In the example, (m.times.n) pixel data which constitute a frame block, are converted into (m/2.times.n/2) data (representative values). In this example, the (m.times.n) pixel data shown in FIG. 42A are passed through a two-dimensional filter (LPF) shown in FIG. 42B, and converted into data shown in FIG. 42C. Thereafter, the black portions of the data is sub-sampled in order to obtain the (m/2.times.n/2) data shown in FIG. 42D. In the two-dimensional filter in FIG. 42B, Z-1 represents one pixel delay, and Z-L represents one line delay for the frame structure. In FIG. 42E, the hatched area corresponds to a two-dimensional frequency component after feature extraction, and the area defined by the dashed lines corresponds to a two-dimensional frequency component prior to feature extraction.
With the simplified detection of the motion vector which can reduce the scale of circuitry by using feature extraction as shown in FIG. 39, the scale of circuitry becomes greater than that for detection of either the frame motion vector or the field motion vector when it is necessary to detect both the frame motion vector and the field motion vector.
Further, a low-pass filter used for feature extraction is insufficient to extract the feature for the block for the following reason.
FIG. 43A and FIG. 43B show vertical waveforms cut in one dimension for odd and even fields when there is no change in shape and no movement of an object between the fields. FIG. 43C and FIG. 43D show data which are sampled for vertical sampling intervals. When the video signals are handled as frame structures in a fashion similar to frame blocks, the sampling data for odd and even fields are merged together as shown in FIG. 43E.
In this case, because there is no object movement in the odd and even fields, the merged signal forms a continuous waveform as shown in FIG. 43E. The frequency component for such condition is shown in FIG. 44C. In other words, when the sampling frequency in the vertical direction for the frame structure is fs, there is no frequency component at fs/2. This is because the frequency components of the sampling data for the odd and even fields in FIG. 43C and FIG. 43D are shown in FIG. 44A and FIG. 44B respectively, and the frequency components at fs/2 differs by 180.degree. and is thereby cancelled.
FIG. 45A and FIG. 45B show vertical waveforms cut in one dimension for the odd and even fields when there is no change in shape but there is movement of an object between the fields. FIG. 45C and FIG. 45D show data which are sampled for the vertical sampling intervals. FIG. 45E shows a situation in which the sampling data for each field are merged together.
Because there is movement of an object in the odd and even fields, the merged signal forms a discontinuous waveform as shown in FIG. 45E. The frequency component for such condition is shown in FIG. 46C. The frequency components at fs/2 are not cancelled because the frequency components at fs/2 shown in FIG. 46A and FIG. 46B for the sampling data of the odd and even fields (FIG. 45C and FIG. 45D) differ by 180.degree.+.alpha..
FIG. 47A and FIG. 47B show vertical waveforms cut in one dimension for the odd and even fields when there are changes in shape and movement of an object between the fields. FIG. 47C and FIG. 47D show data which are sampled for the vertical sampling intervals. FIG. 47E shows a situation in which the sampling data for each field are merged together.
Because there are changes in the shape and movement of an object in the odd and even fields, the merged signal forms a discontinuous waveform as shown in FIG. 47E. The frequency component for such case is shown in FIG. 48C. It may include a large, high-level component at fs/2, and the high level component may occasionally have a larger signal component than that of the low level component. The large frequency component at fs/2 is included because the frequency components at fs/2 shown in FIG. 48A and FIG. 48B for the sampling data for the odd and even fields (FIG. 47C and FIG. 47D, respectively) are offset in phase.
FIG. 49 shows frequency components in accordance with movement of an object and vertical resolution. It is obvious from the figure that the high-level component for the frame structure signal is small despite of the vertical resolution. As the level of movement and the change in shape are increased, the high-level component of the frame structure signal is increased. When a large amount of the high-level component is included, the accuracy of detection of the motion vector may be lost when the low-level component is only extracted by the low-pass filter and feature extraction is performed for the blocks.
An object of the present invention is to provide an improved motion vector detection apparatus which can reduce the scale of circuitry when simultaneously detecting the frame motion vector and field motion vector. Another object of the present invention is to provide an improved motion vector detection apparatus which can improve the accuracy of the simplified detection of the motion vector while reducing the scale of circuitry for feature extraction. A further object of the present invention is to provide an improved predictive coding system for compensating for movement with the motion vector detection apparatus.