1. Field of the Invention
The present invention relates to a unit for detecting a motion vector which is employed for motion compensation for predictive-coded motion picture.
2. Description of the Background Art
In order to transmit or store image signals having an enormous data quantity, a data compression technique for reducing the data quantity is indispensable. Image data have considerable redundancy resulting from correlation between adjacent pixels, human perception and the like. A data compression technique of suppressing the data redundancy for reducing the transmission data quantity is called high efficiency coding. Such high efficiency coding includes inter-frame (-field) predictive coding. In the inter-frame (-field) predictive coding, the following processing is executed:
A predictive error which is the difference between each pixel data of a current screen image (frame or field) to be currently coded and each pixel data of the same position of a referred preceding screen image is calculated. The calculated predictive error is employed for subsequent coding. According to this method, an image having a small motion can be coded in high efficiency, due to high correlation between the screen images. As to an image having a large motion, however, the error is increased due to small correlation between the screen images, and hence the transmission data quantity is disadvantageously increased.
Inter-frame (-field) predictive coding with motion compensation is adapted to solve the aforementioned problem. In this method, the following processing is executed: Before calculation of a predictive error, a motion vector is calculated through pixel data of a current screen (frame or field) and a preceding screen. A predictive image of the preceding screen is moved in accordance with the calculated motion vector. Image data of the preceding screen in a position displaced by the motion vector is regarded as a reference pixel, which in turn is employed as a predictive value. Then, a predictive error between respective pixels of the preceding and current screen images following this motion is calculated, so that the predictive error and the motion vector are transmitted.
FIG. 58 is a block diagram schematically illustrating the overall structure of an encoder for coding image data in accordance with a conventional predictive coding system with motion compensation. Referring to FIG. 58, the encoder includes a preprocessing circuit 910 for executing prescribed preprocessing on an inputted image signal, a source coding circuit 912 for executing elimination of redundancy on the signal preprocessed by the preprocessing circuit 910 and quantization of the input signal, and a video multiplex coding circuit 914 for coding the signal from the source coding circuit 912 in accordance with a prescribed format for multiplex into a code train of a predetermined data structure.
The preprocessing circuit 910 transforms the input image signal to a common intermediate format (CIF) through time and space filters, and executes filter processing for noise elimination.
The source coding circuit 912 performs orthogonal transform processing such as discrete cosine transformation (DCT) on the supplied signal as well as motion compensation on the input signal, and quantizes the orthogonally transformed image data.
The video multiplex coding circuit 914 performs two-dimensional variable length coding on the supplied image signal while executing variable-length coding on various attributes (such as a motion vector) of a block which is a unit of data processing, and thereafter multiplexes the same to a code train of the predetermined data structure.
The encoder further includes a transmission buffer 916 for buffer the image data from the video multiplex coding circuit 914, and a transmission coding circuit 918 for adapting the image data from the transmission buffer 916 to a transmission channel.
The transmission buffer 916 smoothes an information generating rate to a constant level. The transmission coding circuit 918 executes addition of an error correction bit, voice signal data and the like.
FIG. 59 illustrates an exemplary structure of the source coding circuit 912 shown in FIG. 58. Referring to FIG. 59, the source coding circuit 912 includes a motion compensation predictor 920 for detecting a motion vector for the input image signal supplied from the preprocessing circuit 910 and forming a reference image motion-compensated in accordance with the motion vector, a loop filter 922 for performing filter processing on reference image pixel data from the motion compensation predictor 920, a subtracter 924 for obtaining the difference between an output of the loop filter 922 and the input image signal, an orthogonal transformer 926 for orthogonally transforming an output of the subtracter 924, and a quantizer 928 for quantizing the data orthogonally transformed by the orthogonal transformer 926.
The motion compensation predictor 920, the structure of which is described later, includes a frame memory for storing pixel data preceding by one frame (in case of inter-frame predictive coding), for detecting the motion vector and forming motion-compensated reference image pixel data in accordance with input image signal data (pixel data) and pixel data in this frame memory. The loop filter 922 is provided to improve the picture quality.
The orthogonal transformer 926 performs orthogonal transformation such as DCT on data from the subtracter 924, in a block of a prescribed size (8 by 8 pixels, in general) as a unit. The quantizer 928 quantizes the orthogonally transformed pixel data.
The motion compensation predictor 920 and the subtracter 924 execute inter-frame prediction (or inter-field prediction) with motion compensation, for eliminating temporal redundancy in a motion image. On the other hand, the orthogonal transformer 926 performs orthogonal transformation, for eliminating spatial redundancy in a motion image signal.
The source coding circuit 912 further includes an inverse quantizer 930 for transforming the data quantized by the quantizer 928 to a signal state before the quantization, an inverse orthogonal transformer 932 for performing inverse orthogonal transformation on an output of the inverse quantizer 930, and an adder 934 for adding up outputs of the loop filter 922 and the inverse orthogonal transformer 932. The inverse quantizer 930 and the inverse orthogonal transformer 932 form an image employed for inter-frame prediction (or inter-field prediction) for a next frame (or field). The formed pixel data is written in the frame memory included in the motion compensation predictor 920. The adder 934 adds the image signal (inter-frame (-field) difference data) to the output of the loop filter 922, whereby image pixel data of a current frame (field) is reproduced. In general, the inverse quantization processing, the inverse orthogonal transform processing and the addition processing are called local decoding processes. Calculation of a motion vector is now specifically described as to a frame image. Block matching is generally employed for calculation of a motion vector.
Consider that an image A in an (m-1)-th frame moves to A' in an m-th frame, as shown in FIG. 60A. In the block matching, a screen (one frame in this case) is divided into blocks of P by Q pixels (P=Q in general). A block which is most approximate to that of interest in the current frame is found out from a preceding frame. The displacement from the block of interest to the most approximate block in the preceding screen is called "motion vector". This method is now described in more detail.
As shown in FIG. 60B, the m-th frame is regarded as a target frame to be coded. The frame is divided into blocks of N by N pixels (P=Q=N). It is assumed here that Xm(Nk, Nl) represents the value of pixel data in the left upper pixel position (Nk, Nl) in each block of N by N pixels in the m-th frame. The sum of the absolute values of differences with respect to data of corresponding pixels in a block of the preceding frame whose pixel position is displaced by a vector (i, j) and in a block of interest of the current frame is obtained. Then, the displacement vector (i, j) is changed to various values, for obtaining respective absolute difference value sums. The absolute difference value sums are generally called evaluation function values or evaluation values. The position (i, j) providing the minimum absolute difference value sum is defined as the motion vector.
A single motion vector must be transmitted per pixel block. If the block size is reduced, transmission information is so increased that efficient data compression cannot be performed. If the block size is increased, on the other hand, it is difficult to perform effective motion detection. Therefore, the block size is set as 16 by 16 pixels, and the motion vector search range (the maximum change width of i, j) is set as -15 to +15, in general. Motion vector calculation by the block matching is now described specifically.
FIG. 61 illustrates a specific method of calculating a motion vector by the block matching. Consider an image 950 consisting of 352 dots (pixels) by 288 lines, as shown in FIG. 61. This image 950, which may be either a field image or a frame image, is assumed to be a frame image, in order to simplify the illustration. The image 950 is divided into a plurality of blocks each consisting of 16 by 16 pixels. Detection of motion vectors is executed in units of the blocks. With reference to a block 954 in a preceding frame which is on the same position as a block (hereinafter referred to as a template block) 952 to be subjected to motion vector detection processing in the image 950, a block 956 which is larger by "16 pixels" in the horizontal and vertical directions on the screen is assumed to be a search block (hereinafter referred to as a search area). Motion vector search with respect to the template block 952 is executed in the search area 956. A motion vector search method in accordance with the block matching includes the following processing steps:
A block (shown by a vector (i, j) in FIG. 61) having displacement corresponding to a motion vector candidate is obtained. An evaluation function value such as a absolute difference value sum (or sum of squares of difference values) of each pixel of the obtained block and a pixel on a corresponding position of the template block 952 is obtained.
The aforementioned operation is executed for all displacements in the range of (-16, -16) to (+16, +16) as the vector (i, j). After evaluation function values (evaluation values) are obtained for all predictive image blocks (all image blocks in the search area 956), a predictive image block having the minimum evaluation function value is detected. A vector going from the block (the block 954 shown by a vector (0, 0) in FIG. 61) on the same position (hereinafter referred to as right behind block) as the template block 952 to the predictive image block having the minimum evaluation function value is decided as the motion vector for the template block 952.
FIG. 62 illustrates the overall structure of a conventional motion vector detection unit implemented by hardware, described in Proceeding of 1989 IEEE, ICASSP '89, pp. 2453 to 2456, by A. Artieri et al. for example. Referring to FIG. 62, the conventional motion vector detection unit includes a search area input register 962 for inputting pixel data of a search area by one column thereof, a processor array 966 including a plurality of processors which are arranged in a matrix of rows and columns in correspondence to evaluation points (candidates for a motion vector in correspondence to displacement vectors) of a template block respectively, search area side registers 964a and 964b for storing data of the same column in the search area with respect to the processor array 966, and a motion vector detection part 968 for detecting the motion vector in accordance with operation results of the processor array 966.
In the processor array 966, the processors are arranged in correspondence to the evaluation points, i.e., displacement vectors (i, j) respectively. Namely, a processor Pij which is arranged on an i-th row and a j-th column calculates a displacement vector D(i, j).
FIG. 63 illustrates the structure of each processor 970 included in the processor array 966 shown in FIG. 62. Referring to FIG. 63, the processor 970 includes a three-input register 972 for receiving search area pixel data transmitted from horizontal and vertical three-directional processors in the processor array 966 (see FIG. 62) and passing one input in response to a selection signal SEL, a distortion calculation part 974 for calculating a distortion D (absolute difference value sum) on the basis of search area pixel data Y from the three-input register 972 and template block pixel data X supplied from the exterior, and a two-input register 976 for receiving the distortion D from the distortion calculation part 974 and that from a horizontally adjacent processor in the array 966 and selecting and passing either distortion in accordance with a selection signal To.
The processors each having the structure shown in FIG. 63 are two-dimensionally arranged in correspondence to the evaluation points in the search area, i.e., all displacement vectors serving as candidates for the motion vector. The template block pixel data X is supplied in common to the respective processors of the processor array 966 (see FIG. 62). At this time, each processor 970 is supplied with corresponding pixel data in a search area block. In case of template block pixel data X(m, n), for example, the processor Pij is supplied with search area block pixel data Y(i+m, j+n). Search window pixel data is transferred through the search area side registers 964a and 964b shown in FIG. 62 and each processor 970 in the processor array 966. In order to correctly supply the search area block pixel data Y(i+m, j+n) to each processor 970 with respect to the externally supplied template block pixel data X(m, n), the template block and the search area block are scanned with certain regularity.
FIG. 64 illustrates a data scan mode for a template block 999 in the aforementioned motion vector detection unit. Referring to FIG. 64, the template block 999 is first downwardly scanned from the uppermost along a single column, then pixel data of an adjacent column is upwardly scanned from the lowermost, and then pixel data of a next column is downwardly scanned from the uppermost as shown by arrow for forming template block pixel data, which is successively supplied to the motion vector detection unit. This scan method is called "snake scan". In accordance with the "snake scan" of the template block pixel data, search area block pixel data supplied to the processor array 966 is also scanned in a similar manner to the above.
Each processor 970 must vertically or leftwardly transfer the search area pixel data in FIG. 63 depending on its position in the processor array 966. The three-input register 972 is provided to implement such three-directional transfer. The two-input register 976 (see FIG. 63) is provided to transmit the distortion calculated in the processor 970 to the motion vector detection part 968, in order to obtain a displacement vector providing the minimum distortion (evaluation function value) in the motion vector detection part 968 (see FIG. 62) after the evaluation function value of each displacement vector is calculated.
The motion vector detection part 968 detects the minimum distortion among those from the respective processors in the processor array 966, obtains the position of the processor providing the minimum distortion, and decides the position of the processor as the motion vector. The operation of the motion vector detection unit shown in FIGS. 62 and 63 is now briefly described.
In the processor array 966, the processor Pij arranged on the i-th row and the j-th column calculates a distortion D(i, j) which is expressed as follows: EQU D(i, j)=.SIGMA..vertline.X(m, n)-Y (m+i, n+j).vertline.
The summation .SIGMA. is made in relation to m and n. The variation range of m and n is decided by the horizontal and vertical sizes of the template block.
Consider pixels which are arranged in M rows and N columns as a template block 980, as shown in FIG. 65. In a first cycle, each processor 970 in the processor array 966 stores search area block pixel data 982. A pixel X(1, 1) on the first row and the first column in the template block 980 is supplied in common to all processors in the processor array 966. Each processor 970 in the processor array 966 obtains and accumulates absolute difference values of search area block (search window) pixel data Y and supplied template block pixel data X.
In a next cycle, the search area block is downwardly shifted by one row in FIG. 65 in the processor array 966, which in turn stores search area block pixel data 983. In this state, next pixel data X(2, 1) of the template block 982 is supplied. The processor Pij stores search area block pixel data Y(m+i, n+j+1). Absolute difference values are obtained and accumulated again for such pixel data. This operation is repeated M times.
Due to such repetition of the aforementioned operation by M times, all pixel data (X(1, 1) to X(M, 1)) in the first column of the template block 980 are calculated. Then, search area pixel data of the next single column of the search area are written from the exterior through the search area input register 962 shown in FIG. 62. Pixel data of an unnecessary column of the search area are discarded. Thus, new search area pixel data are stored in the search area side registers 964a and 964b and the processor array 966. This operation is repetitively executed every column.
As shown in FIG. 66, calculation of absolute difference value sums is first executed through a search window (a block including all rows in the search area). After completion of M cycles, similar calculation is executed again through pixel data of a next search window (a block adjacent by one column in the search area). Thereafter a similar operation is executed through search windows 994, . . . . When calculation with respect to all pixel data in a search area 996 is finally executed, the processor Pij obtains and holds the distortion D(i, j). The distortion D(i, j) obtained in this processor Pij is transmitted to the motion vector detection part 968, which in turn detects the displacement vector providing the minimum distortion as the motion vector.
When pixel data of a column in the search area is scanned, pixel data of the adjacent column must be newly loaded, as shown in FIG. 66. The processor cannot perform operation during such loading of the pixel data in change of the search window, leading to such a problem that the motion vector cannot be detected at a high speed.
Further, the template block pixel data is supplied in common to all processors in the processor array, and hence a circuit for writing the pixel data of the template block requires high driving capability. Thus, current consumption is increased in this template block pixel data write circuit, disadvantageously leading to increased power consumption in the overall unit.
In case of the aforementioned motion vector detection unit, the respective processors are arranged in correspondence to the displacement vectors (evaluation points) serving as motion vector candidates. When the search area is in the range of +16 to -16 in the vertical direction and -16 to +16 in the horizontal direction, the number of the displacement vectors serving as motion vector candidates is 33.times.33=1089. Thus, the number of the processors is extremely increased to disadvantageously increase the occupied area of the unit.
Some systems have been proposed in relation to predictive image detection in inter-frame (or inter-field) predictive coding with motion compensation. In order to attain better coding efficiency, it is necessary to select the optimum predictive image detection system after performing motion detection processing in accordance with a plurality of predictive image detection systems, for detecting the motion vector in accordance with the optimum predictive image detection system. Screens are formed in units of fields or frames. A single frame is formed by two fields (odd and even fields). For example, the following predictive image detection system is conceivable for each case:
(A) In case of coding pixel data in units of fields:
(a) A field image is divided into a plurality of blocks in a unit of P by Q pixels, for detecting a single motion vector every block (a single predictive image is formed in units of blocks). PA1 (b) Each divided block is further divided into two blocks in relation to the vertical direction of the screen, for detecting a single motion vector for each of the vertically divided blocks. With respect to each block of P by Q pixels, therefore, motion vectors are detected for the upper and lower divided blocks (two predictive images are formed for a single block). PA1 (a) A frame image is divided into a plurality of blocks in a unit of block of P by Q pixels, for detecting a single motion vector for each block (a single predictive image is formed for a single block). PA1 (b) Each block of P by Q pixels is divided into two pixel groups of pixels present in common fields, i.e., those belonging to odd and even fields respectively, for detecting a single motion vector for each pixel group. With respect to each block of P by Q pixels, therefore, motion vectors are detected for the pixel groups belonging to the odd and even fields respectively (two predictive images are formed for a single block).
(B) In case of coding pixel data in units of frames:
In case of the aforementioned motion vector detection unit shown in FIGS. 62 and 63, the processors in the processor array are arranged in correspondence to the displacement vectors (evaluation points) which are the motion vector candidates. Further, the respective processors receive common template block pixel data. Therefore, the processors can merely obtain distortions (evaluation function values) in accordance with the common predictive image system, and cannot detect a plurality of motion vectors in a parallel mode in accordance with a plurality of predictive image detection systems. When the supplied template block pixel data X(m, n) is decided, the supplied search window pixel data Y(m+i, n+j) is also uniquely decided in response thereto, as clearly understood from the expression of the distortion D(i, j) calculated by the processor Pij. In order to execute a plurality of predictive image detection systems for improving coding efficiency, therefore, it is necessary to provide motion vector detection units at least in correspondence to the plurality of predictive image detection systems respectively for driving the motion vector detection units in parallel with each other. Thus, the unit scale as well as the power consumption are disadvantageously increased.