1. Field of the Invention
The present invention relates to a motion vector detection apparatus for use in motion interpolation encoding of successive images formed of arrays of picture elements, such as successive frames of a video signal, and in particular to a motion vector detection apparatus which utilizes the block matching technique.
2. Related Technology
At present, the most widely used algorithm for encoding a stream of images each expressed as an array of picture elements, such as successive frames of a digitized video signal, is the motion interpolation method. With that method, the amount of spatial displacement between images (assumed in the following to be video signal frames) which are displaced in time but should have a high correlation is obtained, as a motion vector. The motion vector is then used in encoding the data expressing successive frames, to minimize redundancy within the resultant encoded data. In general, the motion vector is detected by using the block matching technique. The basic features of that technique will be described in the following referring first to FIG. 1.
With the block matching technique, firstly a block of M.times.N (where M and N are fixed integers) pixels are defined in a frame which is currently being processed to derive a motion vector for that frame, and which will be referred to as the object frame. That M.times.N pixel block will be referred to as the object block, which is sequentially compared with each of a set of blocks that are each equal in size to the object block (referred to in the following as candidate blocks) that are defined within a specific region (referred to in the following as the search range) of a reference frame. The reference frame is close in time to the object frame within the succession of frames, and so should have high correlation with the object frame. In FIG. 1, the arrow 1504 represents a motion vector which expresses the degree of motion of the object block 1501 within the object frame 1500, in relation to the corresponding block 1502 within the reference frame 1503. Specifically, the pixel values of the object block 1501 are compared with the corresponding pixel values of each of a plurality of candidate blocks within a fixed search range 1505. Comparison is executed for each candidate block, for example, by calculating the respective amounts of absolute difference between the pixel values of the object block and the corresponding pixel values of the candidate block, and obtaining the sum of these absolute difference values, as an inter-block error value. The inter-block error values thus obtained for all of the candidate blocks within the search range are then compared, and the candidate block for which maximum correlation is obtained with the object block, is assumed to correspond to the object block 1501, so that the amount of spatial displacement between that candidate block and the position of the object block 1501 within the object frame 1500 (i.e. the position if the object frame were to be superimposed in the reference frame) is the required motion vector.
A prior art motion vector detection apparatus for implementing the block matching technique for motion vector detection has been described in Japanese Patent Laid-open Publication No. HEI 2-213291. The principles of that prior art circuit will be described in the following, referring first to FIG. 2. To simplify the description, it is assumed that the circuit operates on an object block consisting of 2.times.2 pixels, using a search range which contains 3.times.3 candidate blocks. In FIG. 2, an array of nine error amount calculation processors are respectively designated as PR(1,1), PR(2,1), . . . PR(3,3), while a set of six data registers which temporarily hold pixel values that are within the search range, referred to as side registers, are respectively designated by numerals 10 to 15. A set of five input registers 16 to 20 serve to temporarily hold input pixel values that are within the search range, supplied from an external circuit. A minimum value detection circuit 21 serves to detect the smallest one of the inter-block error values that are produced from the processors PR(1,1) to PR(3,3), and detect the one of the processors that has produced that minimum value. The pixel values within the object block are supplied from an external circuit to an input terminal 24, referred to as input terminal S, while pixel values within the search range are supplied to R.sub.A and R.sub.B input terminals 22, 33.
A control circuit 25 supplies control signals CS and CO to the processors, with the signals CS acting on each processor to select a pixel value that is to be operated on by that processor during the current calculation clock period, with three possible input values being selectable for each processor. The control signal CT indicates the start of an inter-block error value calculation operation.
FIG. 3 is a circuit diagram showing the internal configuration of each of the processors PR(1,1) to PR(3,3) of the motion vector detection apparatus of FIG. 2. In FIG. 3, 31 denotes a data register, referred to as an A register, which temporarily holds pixel values within the search range, 32 is a subtractor, 33 denotes an exclusive-OR gate, 34 denotes an adder, 35 denotes a register referred to as a B register, for holding intermediate results of cumulative additions, and 36 denotes a register referred to as a C register, for holding inter-block error values. S(x,y) denotes a pixel value of the object block, supplied during a processing clock period to each of the processors via a common bus. R(i,j) denotes a pixel value that is within a search range that is specific to that processor, which is held in the A register 31 and is outputted from that register during a processing clock period.
The operation of this prior art motion vector detection apparatus is as follows, referring to FIG. 4 which shows the pixel values R11 to R65 within the search range of the reference frame and the pixel values S11 to S22 of the object block within the object frame. The pixel values within the search range are supplied from the R.sub.A, R.sub.B input terminals 22, 23 in the sequence shown by the arrows in FIG. 4. Specifically, these pixel values are supplied from the input terminal 22 to the input register 18 in the sequence R(1,1), R(1,2), R(2,2), R(2,1) . . . R(4,1), and supplied from input terminal 23 to the input register 20 in the sequence R(1,3), R(1,4), R(2,4), R(2,3), . . . R(4,3). Each of the processors PR(1,1) to PR(3,3) and side registers 10 to 15 receives, during a processing clock period, pixel values that were stored during the preceding clock period. Each of these pixel values is supplied from another processor, from a side register, or from an input register, with one of these being selected in accordance with the control signal CS. The input directions that are successively selected by the control signal cs can be expressed as follows, with respect to the positions shown in FIG. 2:
(1) Lower adjacent PA1 (2) Right adjacent PA1 (3) Upper adjacent PA1 (4) Right adjacent PA1 (5) Return to (1) above PA1 For processor PR(1,1): in the sequence R(1,1), R(1,2), R(2,2), R(2,1); PA1 For processor PR(1,2): in the sequence R(1,2), R(1,3), R(2,3), R(2,2); PA1 For processor PR(3,3): in the sequence R(3,3), R(3,4), R(4,4), R(4,3). PA1 M.times.N processors, each of said processors comprising a first register, and an absolute value subtractor circuit for deriving the absolute value of difference between first and second input values supplied thereto, said first input value being supplied from said first register; PA1 memory means for supplying said M.times.N object block picture element values and said search range picture element values in respective predetermined sequences of picture element values; PA1 object block picture element value input circuit means coupled to receive said M.times.N object block picture element values, for selectively transferring each of said object block picture element values to be set into said first register of a predetermined one of said processors; PA1 search range picture element value input circuit means coupled to receive said search range picture element values, for selectively transferring each of said search range picture element values to said absolute value subtractor circuit of a predetermined one of said processors, as said second input value to said absolute value subtractor circuit, to thereby obtain an absolute value difference value between said search range picture element value and said object block picture element value from said first register of said predetermined processor; PA1 cumulative addition circuit means coupled to receive respective absolute value difference values produced from said M.times.N processors, to thereby calculate respective cumulative sum values obtained for said candidate blocks; and PA1 a minimum value detection circuit for detecting a smallest one of said cumulative sum values, to thereby detect one of said candidate blocks which corresponds to said smallest cumulative sum value. PA1 means for successively supplying picture element values within a restricted search range, said restricted search range consisting of picture element values from said reference image including the values of said first candidate block having maximum correlation, and for successively supplying the picture element values of said object block; PA1 interpolation circuit means coupled to receive said restricted search range picture element values, responsive to each of said restricted search range picture element values for outputting a set of picture element values comprising said each restricted search range picture element value together with a plurality of associated interpolated picture element values; PA1 delay circuit means coupled to receive said object block picture element values, responsive to each of said object block picture element values for outputting a set of picture element values comprising said each object block picture element value together with a plurality of associated delayed picture element values; PA1 a plurality of processor circuits equal in number to plurality of second candidate blocks which are within said restricted search range, each of said candidate blocks being of identical size to said object block, each of said processor circuits being coupled to said interpolation circuit means and delay circuit means and including means for calculating a value of absolute difference between one of said set of picture element values from said interpolation circuit means and one of said set of picture element values from said delay circuit means and means for calculating a cumulative sum value of said absolute difference values for a corresponding one of said second candidate blocks; PA1 minimum value detection circuit means for detecting a smallest one of respective cumulative sum values produced from said processor circuits, and for thereby detecting one of said second candidate blocks having maximum correlation with said object block; and PA1 combining circuit means for combining motion vector information expressed by respective positions of said first candidate block having maximum correlation and said second candidate block having maximum correlation, to obtain fractional precision motion vector information for said object block with respect to said main search range; PA1 wherein said interpolation circuit comprises means for generating said interpolated picture element values such that said interpolated picture element values include, in relation to each of said object block picture element values, interpolated picture element values having positive and negative positions in relation to said each object block picture element value, in both row and column directions of said object block.
As a result, pixel values of the respective candidate blocks within the search range of the reference frame are sequentially supplied to the processors as:
In addition, the respective pixel values S(x,y) within the object block are supplied via the input terminal 24 to the processors PR(1,1) to PR(3,3) in the sequence shown by the arrows in FIG. 4, i.e. the sequence S(1,1), S(1,2), S(2,2), S(2,1).
With the configuration shown in FIG. 3, for each of the processors PR(1,1 to PR(3,3), the subtractor 32 and the exclusive-OR circuit 33 function to calculate during each clock period the absolute value of difference .vertline.S(x,y)-R(i,j).vertline. between a pixel value S(x,y) in the object block and a pixel value R(i,j) within the search range. In addition, the adder 34 and register 35 constitute a cumulative adder whereby the cumulative sum is obtained of the values .vertline.S(x,y)-R(i,j).vertline. that are calculated by the subtractor 32 and exclusive-OR circuit 33 in each clock period. Hence, since the processor PR1,1 successively receives as inputs the pixel values within the search range in the sequence R(1,1), R(1,2), R(2,2), R(2,1) together with the corresponding pixel values within the object block, i.e. S(1,1), S(1,2), S(2,2), S(2,1), the processor PR(1,1) derives the cumulative sum: EQU .vertline.S(1,1)-R(1,1).vertline.+.vertline.S(1,2)-R(1,2).vertline.+.vertli ne.S(2,2)-R(2,2).vertline.+.vertline.S(2,1)-R(2,1).vertline.,
the processor PR(1,2) derives the cumulative sum: EQU .vertline.S(1,1)-R(1,2).vertline.+.vertline.S(1,2)-R(1,3).vertline.+.vertli ne.S(2,2)-R(2,3).vertline.+.vertline.S(2,1)-R(2,2).vertline.,
and the processor PR(3,3 derives the cumulative sum: EQU .vertline.S(1,1)-R(3,3).vertline.+.vertline.S(1,2)-R(3,4).vertline.+.vertli ne.S(2,2)-R(4,4).vertline.+.vertline.S(2,1)-R(4,3).vertline.
As a result, each of the processors PR(1,1) to PR(3,3) calculates, for a corresponding one of the candidate blocks within the search range of the reference frame, the respective inter-block error values for these candidate blocks. These error values are then set into the C registers 36 of the processors in response to the signal CT, which indicates the start of the succeeding inter-block error value calculation operation. The inter-block error values are subsequently sequentially read from the C registers 36, and transferred via buses (indicated by broken-line portions in FIG. 2) which interconnect the processors PR(1,1) to PR(3,3), to the minimum value detection circuit 21.
The minimum value detection circuit 21 detects the smallest of these inter-block error values, and in addition detects the position of the processor which has calculated that smallest inter-block error value. As described above, the processors PR(1,1) to PR(3,3) respectively calculate the inter-block error values for respective ones of the candidate blocks within the search range. Hence, the position of the processor which calculates the minimum inter-block error value indicates the position of the candidate block for which that minimum value has been obtained. The minimum value detection circuit 21 thereby obtains the amount of displacement between the object block and the candidate block for which the minimum inter-block error value was calculated, and so obtains the desired motion vector information.
With such a prior art motion vector detection apparatus, assuming that a motion vector is to be detected for an object block consisting of M.times.N pixels and a search range of H.times.V pixels, then in order to minimize the search range, i.e. in order to reduce the number of pixel values which must be supplied to the circuit in each clock period, it is necessary to use a total of H.times.2N side registers, and V+2N input registers, for holding pixel values which are within the search range but will not be used for calculation during the current clock period. As a result, the problem arises that the circuit becomes large in scale.
The type of apparatus-described above provides motion vector detection with integer precision, i.e. to unit pixel accuracy. To attain greater accuracy, there have been proposals in the prior art to provide a motion vector detection apparatus which provides fractional precision for motion vector detection. With such a method, using the block matching technique described above, an optimum correlation candidate block (i.e. to integer precision) is first found with integer precision with respect to the object block, within a search range of the reference frame which can be referred to as the main search range, thereby obtaining a corresponding motion vector. To obtain fractional accuracy of motion vector detection, a restricted search range is then defined, using the pixel values of the aforementioned optimum correlation candidate block together with interpolated values which are calculated using the pixel values of that corresponding candidate block and specific pixel values which are adjacent to that corresponding candidate block within the main search range. Block matching is then executed for the object block within that restricted search range, to obtain an optimum candidate block within the restricted search range. A motion vector having fractional precision is thereby obtained with respect to the restricted search range, which can be combined with the integer precision motion vector information, to obtain a fractional precision motion vector with respect to the main search range.
An example of a prior art motion vector detection apparatus for providing such fractional-pixel motion vector detection accuracy is described in U.S. Pat. No. 4,937,666. As shown in FIG. 6 of that disclosure, the apparatus basically consists of a set of memory circuits for storing the pixel values of the object frame, the reference frame, the search range,and the object block, a circuit referred to as an integer precision subcircuit which operates on pixel values of the object block and the search range to obtain an optimum correlation candidate block with respect to a main search range, i.e. to obtain integer precision motion vector information, and a circuit referred to as a fractional precision subcircuit which operates on pixel values of the object block and of a restricted search range to obtain a fractional precision motion vector with respect to the restricted search range. That fractional precision motion vector is then combined with the integer precision motion vector to obtain the a fractional precision motion vector with respect to the main search range.
Although the concept "restricted search range" is not clearly described in that patent disclosure, it is clear that such a set of pixels is defined, since as described in that disclosure, values representing the integer precision motion vector which are generated by the integer precision subcircuit are used to control a memory address generator such that appropriate pixel values from the main search range are utilized by the fractional precision subcircuit.
The essential features of the fractional precision subcircuit of that prior art patent will be described referring to the block circuit diagram of FIG. 5. It will be assumed that the circuit of FIG. 5 operates on an object block which is a 3.times.3 array of pixels, whose values S(i, j) are designated S(0,0) to S(2,2) respectively. A first sequence of pixel values R(i,j) of the restricted search range are supplied to an input terminal 601 in successive clock periods, while a second sequence of remaining pixel values P(i, j+1) of that region are supplied to a second input terminal 602. (It is assumed that values of "i" increase positively in the horizontal rightward direction of a pixel array, and that values of "j" increase positively in the vertical downward direction of a frame, with left-to-right and top-to-bottom scanning directions). The pixel values R(i, j) are thereby supplied to one input of an adder 603 and one input of a selector 604, while the pixel values R(i,j+1) are supplied to the other input of adder 603. The adder 602 includes an output value halving function, whereby each sum value produced is divided by two. The resultant values are successively set into a latch 606. The selector 604 is controlled such as to alternately select the values of the sequence R(i,j+1) and the output values from adder 603, in successive clock periods. The combination of selector 604 and adder 603 is designated as the V interpolation circuit 605, for executing interpolation of pixel values in the vertical array direction. Each output value from the V interpolation circuit 605 is sequentially set into latches 606, 607 and 608 in successive clock periods. The delayed output values from latches.606 and 608 are supplied to an adder 609, which constitutes an H interpolation circuit 610. The adder 609 also executes the aforementioned output halving function. The output values from latch 608 are also supplied to a bus, designated as the R1 bus 611, and output values from adder 609 to a bus which is designated as the R2 bus 612. The object block pixel values S(i,j) are sequentially supplied via an input terminal 613 to a bus which is designated as the S bus 614. A subtractor 619 calculates the difference between each object block pixel value S(i,j) and a search range pixel value from the R1 bus 611, the result delayed by one period in a latch 615, then the absolute value is derived by an absolute value circuit 620, and set in a latch 616. Each delayed output value from latch 616 is supplied to one input of an adder 621, and the resultant sum values are delayed in two cascade-connected latches 617, 618 before being supplied to the other input of the adder 621. The subtractor 619, latch 615, absolute value circuit 620, adder 621 and latches 616 to 618 constitute a calculation circuit which will be referred to as the No. 1 processor 622. The object block pixel values S(i,j) and the output values from the H interpolation circuit 610 are supplied to a No. 2 processor 623, which is of identical configuration to the processor 622. The processors 622 and 623 derive respective inter-block error values for candidate blocks within the search range (i.e. restricted search range, which is formed of the aforementioned interpolation pixel values from the reference frame) which are each temporarily held in the latches 618 of the processors 622, 623, and these are supplied to a minimum value detection circuit 54 which detects the candidate block for which a minimum inter-block error value has been obtained. The i, j component values of the motion vector within the restricted search range are thereby derived, and sent to output terminals 625.
A more specific description of the circuit of FIG. 5 will be given, assuming that the circuit operates on an object block which is a 3.times.3 array of pixels, whose values S(i, j) are designated S(0,0) to S(2,2) respectively. The operation will be described referring to the appended Table 1, in which the column "Clock" defines successive processing clock periods. The reference frame pixel values which are supplied to the input terminals 601 and 602 will be referred to as the search range pixel values. As shown, successive ones of these search range pixel values are each supplied to the input terminals 601, 602 during two consecutive clock periods. In the case of input terminal 601, the sequence is R(0,0), R(1,0), R(2,0), R(3,0), R(0,1), R(1,1), R(2,1), R(3,1), R(0,2), R(1,2), R(2,2), R(3,2). In the case of input terminal 602, the sequence is R(0,1), R(1,1), R(2,1), R(3,1), R(0,2), R(1,2), R(2,2), R(3,2), R(0,3), R(1,3), R(2,3), R(3,3). Thus each pixel value supplied to input terminal 602 is delayed with respect to that supplied to input terminal 601 in the same clock period by an amount representing one pixel displacement in the vertical direction of the array. For that reason, the pixel values supplied to input terminal 602 are designated as R(i,j+1).
In the V interpolation circuit 605, the adder 603 obtains, in each clock period, a sum R(i,j)+R(i,j+1), and divides that sum by two, to obtain an interpolation value which is designated R(i,j+0.5). The selector 604 alternately selects the pixel values R(i,j) from input terminal 601 and the interpolation pixel values R(i,j+0.5) from the adder 603, so that the values shown in the column for latch 606 in Table 1 are successively set therein, then in latches 607 and 608. In the H interpolation circuit 610, the adder 609 obtains the sum of the values that are currently held in latches 606, 608. Each value produced from latch 608 is thus delayed by two clock periods with respect to that value being produced from latch 606. Designating the value held in latch 608 as R(i,j), the value held in latch 606 is R(i+1,j), so that the adder 609 produces the interpolation value R(0.5,j). Similarly, assuming that the interpolation pixel value R(i,j+0.5) is held in the latch 608, then the interpolation pixel value R(i+1,j+0.5) is held in the latch 606, so that the interpolation pixel value R(i+0.5,j+0.5) is supplied to the bus 612. As a result, the sequences of values supplied to the R1 bus 611 and R2 bus 612 are as indicated by the corresponding columns in Table 1.
The object block pixel values S(i,j) are sequentially supplied to the bus 614, each being supplied for two consecutive clock periods, as shown in the corresponding column of Table 1. For each pixel value of S(i,j), the first period in which the value is supplied is synchronized with that in which the correspondingly positioned pixel value within the search range is being supplied to bus 611. The processor 622 executes the following pipeline operations. The difference between the values which are supplied to buses 611 and 614 is obtained by the subtractor 619, and the result set in the latch 615. The absolute value of that difference value is then obtained by the absolute value circuit 620, and the result set in the latch 616. In addition, the cumulative sum of the difference values is derived once in every two clock periods, by the cumulative addition circuit that is formed of the cascaded latches 617, 618 and the adder 621, operating on the values that are set into the latch 616.
As shown by the timings in Table 1, during two clock periods from the third clock period, a pixel value R(i,j) is being outputted to bus 611 and the pixel value S(i,j) outputted to bus 614. During two clock periods from the fourth clock period, the interpolated pixel value R(i, j+0.5) is outputted to the bus 611 and the pixel value S(i,j) is supplied to bus 614. As a result, the No. 1 processor 622 derives the cumulative sums S(i,j)-R(i,j) and S(i,j)-R(i, j+0.5), i.e. derives the inter-block error values D.sub.0,0 and D.sub.0,0.5. The processor 233 has the same configuration as processor 622, but differs in being connected to the bus 612 rather than to bus 611. Hence, the processor 623 calculates the cumulative sum values S(i,j)-R(i+0.5,j) and S(i,j)-R(i+0.5, j+0.5), i.e. derives the inter-block error values D.sub.0.5,0 and D.sub.0.5, 0.5. The inter-block error values D.sub.v,w that are calculated by processors 622, 623 (where v, w each take values 0 or 0.5) are transferred to the minimum value detection circuit 624, to obtain the smallest inter-block error value, D.sub.x,y. The displacement values x,y between the object block and the candidate block for which the minimum inter-block error value D.sub.x,y has been calculated (i.e. the motion vector components) are outputted to the terminal 625, then combined with integer-accuracy displacement values, to obtain the required complete fractional-accuracy motion vector information.
However with such a prior art fractional-accuracy motion vector detection apparatus, there is the disadvantage that only the pixel values S(i,j) and the pixel values or interpolated pixel values R(i+v,j+w) exist. Thus it is only possible for the processors to obtain the inter-block error values D.sub.v,w. That is to say, only positive component values for a fractional-accuracy motion vector with respect to the restricted search range can be obtained, for example within the search range to 0.5, in each of the horizontal and vertical directions. With the apparatus described, it would not be possible to obtain negative component values of the fractional-accuracy motion vector, in the horizontal and vertical directions.