1. Field of the Invention
The present invention relates to devices for detecting motion vectors which are employed for motion compensation of a moving picture image in predictive coding.
2. Description of the Background Art
A data compression technique of reducing an amount of data is indispensable for transmitting or storing picture signals having a large amount of data. Picture data have considerable redundancy resulting from correlation between adjacent pixels and human perceptional characteristics. A data compression technique of suppressing the data redundancy for reducing the volume of data for transmission of data is called high efficiency coding. One of such high efficiency coding systems is a frame-to-frame predictive coding system, which is adapted to carry out the following processing:
A predictive error, which is a difference between each pixel data of a current screen (frame or field) to be currently coded and each pixel data of the same position of a precedent screen to be referred to is calculated, so that the predictive error as calculated is thereafter employed for coding. According to this method, it is possible to code picture images having small movements in high efficiency, due to high correlation between the screens. As to picture images having large movements, however, errors are disadvantageously increased due to small correlation between screens, leading to increase in volume of data to be transmitted.
In order to solve the aforementioned problem, a frame-to-frame (field-to-field) predictive coding system with motion compensation is adapted to carry out the following processing: First, motion vectors are previously calculated through pixel data of a current screen (frame or field) and a precedent screen before calculating predictive errors. A predictive picture image of the precedent screen is moved in accordance with the motion vector as calculated. Picture data in a position which is displaced from that of the precedent screen by the motion vector are regarded as reference pixels, which in turn are employed as predicted values. Then, predictive errors between respective pixels of the precedent screen as moved and the current screen are calculated so that the predictive errors and the motion vectors are transmitted.
FIG. 151 is a block diagram schematically showing an overall structure of a conventional encoder for coding picture data in accordance with the predictive coding system with motion compensation. Referring to FIG. 151, the encoder includes a preprocessing circuit 910 for carrying out prescribed preprocessing on picture signals as received, a source coding circuit 912 for eliminating redundancy from the signals preprocessed by the preprocessing circuit 910 and quantizing input signals, and a video multiplex coding circuit 914 for coding signals received from the source coding circuit 912 in accordance with a prescribed format and multiplexing the coded signals to a code train of a predetermined data structure.
The preprocessing circuit 910 converts input picture signals to those of a common intermediate format (CIF) through time and space filters, and performs filter processing for noise removal.
The source coding circuit 912 performs orthogonal transformation processing such as discrete cosine transformation (DCT) on received signals as well as motion compensation for input signals, while quantizing picture data subject to the orthogonal transformation.
The video multiplex coding circuit 914 performs two-dimensional variable-length coding on received picture signals with variable-length coding of various attributes, such as motion vectors, of blocks which are units of data processing, and thereafter multiplexes the signals to a code train of a predetermined data structure.
The encoder further includes a transmission buffer 916 for buffering picture data from the video multiplex coding circuit 914, and a transmission coding circuit 918 for adapting the picture data from the transmission buffer 916 to a transmission channel.
The transmission buffer 916 smooths information generating speeds to a constant speed. The transmission coding circuit 918 executes addition of error checking bits and sound signal data.
FIG. 152 illustrates an exemplary structure of the source coding circuit 912 shown in FIG. 151. Referring to FIG. 152, the source coding circuit 912 includes a motion compensation predictor 920 for detecting motion vectors with respect to input picture signals received from the preprocessing circuit 910 and forming reference picture images motion-compensated in accordance with the motion vectors, a loop filter 922 for performing filter processing on reference picture image pixel data received from the motion compensation predictor 920, a subtracter 924 for obtaining differences between outputs of the loop filter 922 and input picture signals, an orthogonal transformer 926 for orthogonally transforming outputs of the subtracter 924, and a quantizer 928 for quantizing data orthogonally transformed by the orthogonal transformer 926.
The motion compensation predictor 920, the structure of which is described later in detail, includes a frame memory for storing pixel data of a precedent frame, for detecting motion vectors and forming motion-compensated reference picture image pixel data in accordance with input picture signal data and pixel data in the frame memory. The loop filter 922 is provided to improve the picture quality.
The orthogonal transformer 926 carries out orthogonal transformation such as DCT transformation on data received from the subtracter 924 in units of blocks of a prescribed size (8 by 8 pixels in general). The quantizer 928 quantizes the orthogonally transformed pixel data.
The motion compensation predictor 920 and the subtracter 924 execute frame-to-frame prediction with motion compensation, for eliminating time redundancy in a motion image. Further, spatial redundancy in motion image signals is eliminated by orthogonal transformation through the orthogonal transformer 926.
The source coding circuit 912 further includes an inverse quantizer 930 for transforming the data quantized in the quantizer 928 to the original signal states, an inverse orthogonal transformer 932 for performing inverse orthogonal transformation on outputs of the inverse quantizer 930, and an adder 934 for adding up outputs of the loop filter 922 and the inverse orthogonal transformer 932. The inverse quantizer 930 and the inverse orthogonal transformer 932 form a picture image which is employed in frame-to-frame prediction for a subsequent frame. The picture data as generated are written in the frame memory which is included in the motion compensation predictor 920. The adder 934 adds picture signals (frame-to-frame difference data) to the outputs of the loop filter 922, whereby the picture data of the current frame are reproduced. In general, such inverse quantization, inverse orthogonal transformation and addition are called local decoding processes. Calculation of the motion vectors is now described more specifically. In general, a block matching method is employed for calculating the motion vectors.
As shown in FIG. 153A, consider that a picture image A in a (mxe2x88x921)-th frame is moved to Axe2x80x2 in an m-th frame. In the block matching method, the screen (one frame in this case) is divided into blocks each including P by Q pixels (P=Q in general). A precedent frame is searched for a block which is most approximate to that of interest in the current frame. Displacement from the interested block to the most approximate block in the precedent frame is called a motion vector. Description is now made in more detail.
As shown in FIG. 153B, it is assumed that the m-th frame is to be coded. The frame is divided into blocks each having N by N pixels (P=Q=N). It is assumed that pixel data in the upper leftmost pixel position (Nk, N1) in the block of the N by N pixels in the m-th frame has a value Xm(Nk, N1). The sum of absolute values of differences between corresponding pixels in the block of the precedent frame having pixel positions displaced by a vector (i, j) and the block in the current frame is obtained. Then, the displacement vector (i, j) is changed to various values, to obtain the sum of respective absolute differential values. The absolute differential value sum is generally called an evaluation function value. The position (i, j) providing the minimum absolute differential value sum is defined as the motion vector.
It is necessary to transmit one motion vector every pixel block. If the block size is reduced, the volume of information to be transmitted is increased to disable effective data compression. If the block size is increased, on the other hand, it is difficult to perform effective movement detection. In general, therefore, the block size is set at 16 by 16 pixels, and a motion vector search range (maximum change width of i, j) is set at xe2x88x9215 to +15. Motion vector calculation by the block matching method is now described more specifically.
FIG. 154 illustrates a specific method of calculating a motion vector by the block matching method. Consider a picture image 950 which is formed by 352 dots (pixels) by 288 lines, as shown in FIG. 154. The picture image 950 is divided into a plurality of blocks in units of 16 by 16 pixel groups. The motion vector is detected in the units of these blocks. It is assumed that a search block (hereinafter referred to as a search area) is formed by a block 956 which is larger by xc2x116 pixels in the horizontal and vertical directions on the screen with reference to a block 954 in a precedent frame, which is in the same position as a target block (hereinafter referred to as a template block) 952 for motion vector detection. Motion vector search for the template block 952 is executed in this search area 956. The motion vector searching method in accordance with block matching comprises the following processing steps:
A block (shown by the vector (i, j) in FIG. 154) having displacement corresponding to a motion vector candidate is obtained. Evaluation function value such as a sum of absolute differential values (or square differential sum) of respective pixels in the block as obtained and those in corresponding positions of the template block 952 is obtained.
The aforementioned operation is executed on all displacements in a range of (xe2x88x9216, xe2x88x9216) to (+16, +16) as the vector (i, j). Evaluation function values (evaluation values) are obtained with respect to all predictive picture image blocks (all picture image blocks in the search area 956), and thereafter a predictive picture image block having the minimum evaluation function value is detected. A vector which is directed from a block (the block 954 shown by a vector (0, 0) in FIG. 154) provided on the same position (hereinafter referred to as the right back) as the template block 952 toward the predictive picture image block having the minimum evaluation function value is decided as the motion vector for this template block 952.
FIG. 155 illustrates an overall structure of a conventional hardware-implemented motion vector detecting device, such as that described in xe2x80x9cProceedingxe2x80x9d by A. Artieri et al., IEEE ICASSP ""89 (1989), pp. 2453-2456, for example. Referring to FIG. 155, the motion vector detecting device includes a search area input register 962 for inputting pixel data of a search area by one column of the search area, a processor array 966 including a plurality of processors which are arranged in a matrix of rows and columns of the same size as evaluation points (motion vector candidates) of a template block, search area side registers 964a and 964b for storing data of the same column in the search area with respect to the processor array 966, and a motion vector detecting part 968 for detecting a motion vector in accordance with an operation result of the processor array 966.
In the processor array 966, the processors are arranged in correspondence to respective evaluation points, i.e., respective displacement vectors (i, j). Namely, a processor Pij which is arranged on an i-th row and a j-th column calculates a displacement vector D(i, j).
FIG. 156 illustrates the structure of each processor 970 which is included in the processor array 966 shown in FIG. 155. Referring to FIG. 156, the processor 970 includes a three-input register 972 receiving search area pixel data transmitted from three processors in the horizontal and vertical directions of the processor array 966 for passing one of the inputs in response to a selection signal SEL, a distortion calculating part 974 for calculating distortion (absolute differential value sum) on the basis of search area pixel data Y received from the three-input register 972 and externally supplied template block pixel data X, and a two-input register 976 receiving distortion D from the distortion calculating part 974 and that from a horizontally adjacent processor provided in the processor array 966 for selecting and passing either one in accordance with a selection signal To.
Such processors 970 shown in FIG. 156 are two-dimensionally arranged in the processor array 966 shown in FIG. 155 in correspondence to the evaluation points in the search area, i.e., all displacement vectors regarded as candidates for motion vectors. The template pixel data X are supplied to the respective processors 970 of the processor array 966 (see FIG. 155) in common. The processors 970 are also supplied with corresponding picture data in a search area block. When the template block pixel data X is X(m, n), for example, the processor Pij is supplied with search area block pixel data Y(i+m, j+n). Search window data are transferred through the search area side registers 964a and 964b shown in FIG. 155 and the processors 970 provided in the processor array 966. In order to correctly supply the search area block pixel data Y(i+m, j+n) to each processor with respect to the externally supplied template block pixel data X(m, n), it is necessary to scan the template block and the search area block with certain regularity.
FIG. 157 illustrates a template block data scanning mode in the aforementioned motion vector detecting device. Referring to FIG. 157, a template block 999 is first downwardly scanned from the above along a column, and then pixel data of the adjacent column are upwardly scanned from the below so that template block pixel data are formed and successively supplied to the motion vector detecting device. This scanning method is called xe2x80x9csnake scanningxe2x80x9d. The search area block pixel data which are supplied to the processor array 966 are also scanned in accordance with the xe2x80x9csnake scanningxe2x80x9d of the template pixel data. Each processor 970 must vertically or leftwardly transfer the search area pixel data in FIG. 156 depending on the position in the processor array 966. The three-input register 972 is adapted to implement such three-directional data transfer. The two-input register 976 (see FIG. 156) is adapted to transmit distortion data which is calculated by the processor 970 to the motion vector detecting part 968 (see FIG. 155), in order to obtain a displacement vector providing the minimum distortion (evaluation function value) in the motion vector detecting part 968 after the evaluation function value of each displacement vector is calculated. The motion vector detecting part 968 detects the minimum distortion among those received from the respective processors 970 of the processor array 966 and obtains the position of the processor providing the minimum distortion, thereby deciding this position as the motion vector. The operation of the motion vector detecting device shown in FIG. 155 is now briefly described.
The processor Pij which is arranged on the i-th row and the j-th column in the processor array 966 calculates distortion D(i, j) which is expressed as follows:
D(i, j)=xcexa3|X(m, n)xe2x88x92Y(m+i, n+j)|
The sum xcexa3 is obtained with respect to m and n. Ranges of change of m and n are decided by the size of the template block.
Consider that pixels are arranged in m rows and n columns in a template block 980, as shown in FIG. 158. In a first cycle, each processor 970 of the processor array 966 stores search area block pixel data 982. A pixel X(1, 1) in the first row and first column of the template block 980 is externally supplied to all processors 970 of the processor array 966 in common. Each processor 970 of the processor array 966 obtains absolute differential value of the search area block (search window) pixel data Y stored therein and the template block pixel data X as received and accumulates the same.
In a next cycle, the search area block is downwardly shifted in the processor array 966 by one row in FIG. 158. The processor array 966 stores search area block (search window) pixel data 983. In this state, next pixel data X(2, 1) is supplied from the template block 980. The processor Pij ensures search window pixel data Y(m+i, n+j+1). Absolute differential values are again obtained and accumulated through these pixel data. This operation is repeated M times.
When the aforementioned operation is repeated M times, search area pixel data in a column of the search area are externally written through the search area input register 962 shown in FIG. 155. Unnecessary picture data of one column of the search area are discarded. Thus, new search area pixel data are stored in the search area side registers 964a and 964b and the processor array 966. This operation is repeatedly executed every column.
Namely, calculation of absolute differential value sums is executed first through the search window (block including all rows in the search area). Upon completion of M cycles, similar calculation is executed through pixel data of a next search window (block which is rightwardly adjacent by one column in the search area). Thereafter similar operations are executed for a search window 994, . . . When calculation is finally executed on all pixel data of a search area 996, the processor Pij obtains and stores the distortion D(i, j). The distortion D(i, j) obtained in the processor Pij is transmitted to the motion vector detecting part 968, so that the displacement vector providing the minimum distortion is detected as the motion vector.
Some systems are proposed in relation to a predictive picture image detecting method in a frame-to-frame (or field-to-field) predictive coding system with motion compensation. In order to attain better coding efficiency, it is necessary to select the optimum predictive picture image detecting system after performing motion detection in accordance with a plurality of predictive picture image detecting systems for detecting motion vectors in accordance with the optimum predictive picture image detecting method. A screen may be formed in a unit of a field or a frame. One frame is formed by two fields (even and odd fields). The following predictive picture image detecting systems are provided for the respective cases, for example:
(A) For coding pixel data in units of fields:
(a) A field picture image is divided into a plurality of blocks in a unit of P by Q pixels, for detecting a single motion vector (forming a single predicted picture image) every block.
(b) Each divided block is further vertically divided into two parts on the screen, for detecting a single motion vector every part. Thus, motion vectors for upper and lower halves are detected (two predicted picture images are formed) with respect to each block of P by Q pixels.
(B) For coding pixels in units of frames:
(a) A frame picture image is divided into a plurality of blocks in a unit of block of P by Q pixels, for detecting a single motion vector (forming a single predicted picture image) every block.
(b) Each block of P by Q pixels is divided into two pixel groups of pixels existing in the same fields, i.e., one and the other respectively, belonging to odd and even fields, for detecting a single motion vector every pixel group. Thus, motion vectors for pixel groups belonging to the even and odd fields are detected (two predicted picture images are formed) for every block of P by Q pixels.
In the aforementioned structure of the motion vector detecting device, the respective processors in the processor array are arranged in correspondence to displacement vectors (evaluation points) which are candidates for the motion vector. Further, the processors receive the same template block pixel data. Therefore, the processors can only obtain distortion (evaluation function values) in accordance with the same predictive pixel system, and it is impossible to detect a plurality of motion vectors in parallel in accordance with a plurality of predictive picture image detecting systems. When the template block pixel data X(m, n) to be supplied is decided, the search window pixel data Y(m+i, n+j) which is supplied correspondingly is also uniquely decided as clearly understood from the above equation of the distortion D(i, j) calculated by the processor Pij. In order to execute a plurality of predictive picture image detecting systems for improving coding efficiency, therefore, it is necessary to provide motion vector detecting devices at least in correspondence to a plurality of predictive picture image detecting systems and to drive these detecting devices in parallel with each other, disadvantageously leading to increase in device scale as well as in power consumption.
In the aforementioned motion vector detecting device, further, the same template block pixel data are supplied in common to all processors in the processor array. Therefore, strong driving power is required for a circuit for writing the pixel data of the template block, disadvantageously leading to increase in current consumption in the template block pixel data write circuit as well as in power consumption for the overall device.
In the aforementioned motion vector detecting device, in addition, the respective processors are arranged in correspondence to the displacement vectors (evaluation points) forming candidates for the motion vectors. When a search area is in a range of +16 to xe2x88x9216 in the vertical direction and xe2x88x9216 to +16 in the horizontal direction, the number of the displacement vectors forming the candidates for the motion vectors is 33xc3x9733=1089. Thus, the number of the processors is extremely increased to increase the occupied area of the device.
In each cycle of the arithmetic operation, data are transferred in the processor array through the processors. In this case, it is necessary to drive one input of each of three-input registers which are provided for deciding the data transfer direction between the processors. Thus, power consumption in data transfer is disadvantageously increased.
Motion vector search systems include a full search system and a hierarchical search system. The full search system is adapted to obtain evaluation function values (absolute differential value sums or square differential sums) for all displacement vectors (evaluation points) in a search area, for obtaining a motion vector from the evaluation function values of all evaluation points. The hierarchical search system, an example of which is described in Japanese Patent Publication No. 3-68597 (1991), is adapted to decide representative evaluation points of those in a search area (to cull or thin out the evaluation points), for obtaining evaluation function values as to the representative evaluation points. A representative evaluation point which provides the minimum evaluation function value is obtained from the evaluation function values of the representative evaluation points, and regarded as the optimum representative evaluation point. Then, evaluation function values are obtained as to all evaluation points in a region of a prescribed size around the optimum representative evaluation point, and the optimum evaluation point is obtained from the evaluation function values, to decide a motion vector.
In the hierarchical search system, it is not necessary to obtain the evaluation function values as to all evaluation points in the search area, whereby the device scale can be reduced and the motion vector detecting time can be reduced. However, this system is inferior in accuracy to the full search system since the evaluation points are culled for detecting the motion vector. On the other hand, the full search system is disadvantageously increased in device scale, although the same is superior in accuracy to the hierarchical search system. Assuming that the number of times of operations per unit time is constant and devices are formed by the same volumes of hardware, accuracy is deteriorated in the hierarchial search system although predictive picture images can be obtained for a wide search area, while accuracy of the motion vectors is excellent in the full search system although the search area thereof is narrowed.
Particularly in a motion picture coding system which is now being standardized in ISO (International Organization for Standardization), it is necessary to detect movements through a screen which is in a position separated by several frames or fields in time, since prediction is made through screens which are precedent and subsequent in time to a current screen picture image. An example of such an operation is B picture motion vector detection in a storage medium coding technique. In the storage medium coding technique, picture data are stored in a storage medium with no restriction in time base. With respect to picture data called B pictures, therefore, there are forward prediction for detecting predictive picture images through past picture images, inverse prediction for detecting predictive picture images through future picture images, and interpolative prediction for making prediction through precedent and subsequent screens. When prediction is made through such precedent and subsequent screens, it is necessary to widen the search area in order to follow quick movements, since if the search area is narrow, the picture images jut out of the search area to disable correct motion vector detection. It is regarded as being preferable to perform searching in a search area which is in a range of xc2x164 to xc2x1128 pixels in both of horizontal and vertical directions about a template block. In order to implement such a wide search area without increasing the device scale, it is necessary to utilize the aforementioned hierarchical search system.
As to a motion predictive system which is on the premiss of the conventional hierarchical search system, however, it is possible to merely detect predictive picture images in accordance with a motion detection system along a single predictive mode, and there has not yet been proposed a hierarchical search system to be employed in the aforementioned motion detecting system with a plurality of predictive modes.
An object of the present invention is to provide a motion vector detecting device which can detect predictive picture images in correspondence to a plurality of predictive modes without increasing the device scale.
Another object of the present invention is to provide a motion vector detecting device which can detect predictive picture images (motion vectors) at a high speed in accordance with a plurality of predictive modes without increasing the consumed current and complicating the device structure.
Still another object of the present invention is to provide a motion vector detecting device which can detect motion vectors in high accuracy in accordance with a hierarchical search system without increasing the device scale.
A further object of the present invention is to provide a motion vector detecting device which can detect motion vectors for respective ones of a plurality of predictive modes (predictive picture image detection systems) in accordance with a hierarchical search system at a high speed in a simple circuit structure.
A motion vector detecting device according to a first aspect of the present invention includes evaluation circuitry for obtaining evaluation function values indicating similarity levels between a current picture image block of a prescribed size, being the target of motion vector detection in a current screen picture image, and each of a plurality of reference picture image blocks in a region related to the current picture image block in a reference screen in accordance with a plurality of predictive modes at a time, and motion vector decision circuitry receiving the evaluation function values for the respective reference blocks and with respect to the plurality of predictive modes from the evaluation circuitry for deciding motion vectors as to the respective predictive modes for the current picture image block in a parallel manner.
A motion vector detecting device according to a second aspect of the present invention includes first calculation circuitry executing block matching between a current picture image block, being the target of motion vector detection, and each of reference picture image blocks corresponding to representative evaluation points among all evaluation points in a search region having a prescribed size in a reference screen which is related to the current picture image block in accordance with a plurality of predictive modes in a parallel manner for obtaining an optimum representative evaluation point exhibiting the best similarity for each of the plurality of predictive modes, second calculation circuitry which is provided in correspondence to each of the plurality of predictive modes for performing block matching between the current picture image block and the reference picture image block in accordance with the optimum representative evaluation points for respective predictive modes from the first calculation circuitry on all evaluation points included in a region of a prescribed size in a search region including the corresponding optimum representative evaluation point in accordance with each of the corresponding predictive modes for calculating optimum vectors and optimum evaluation function values for the corresponding predictive modes, and motion vector decision circuitry for deciding motion vectors for the current picture image block from the optimum vectors in accordance with outputs of the second calculation circuitry.
In the motion vector detecting device according to the first aspect of the present invention, the evaluation circuitry calculates evaluation function values corresponding to the plurality of predictive modes (predictive picture image detection systems) respectively in a parallel manner and the motion vector decision circuitry decides motion vectors for the respective predictive modes in accordance with outputs of the evaluation circuitry, whereby motion vectors can be detected in correspondence to a plurality of predictive modes at a high speed without increasing the device scale.
In the motion vector detecting device according to the second aspect of the present invention, the first calculation circuitry culls evaluation points for deciding optimum evaluation point candidates and then the second calculation circuitry calculates optimum vectors for the plurality of predictive modes respectively thereby finally deciding the motion vectors, whereby the motion vectors can be detected at a high speed without increasing the device scale.
The foregoing and other objects, features, aspects and advantages of the present invention will become more apparent from the following detailed description of the present invention when taken in conjunction with the accompanying drawings.