1. Field of the Invention
The present invention relates to a dynamic image encoding method and an apparatus therefor, and in particular to a dynamic image encoding method and an apparatus therefor which apply a hierarchical motion vector detection method.
In accordance with a recent advance of a semiconductor technology, a main function of a dynamic image real time encoder which was formed of a plurality of semiconductor chips has been achieved by only a single chip. In the case where the encoder is realized in the form of hardware, a method of realizing a motion vector detection process which requires a huge amount of operation is an important point to determine cost performance.
2. Description of the Related Art
A variety of motion vector detection methods have been proposed to simultaneously achieve both of the expansion of a motion vector detection area (range) and the reduction of the operation amount. One of the methods is the hierarchical motion vector detection method. In this method, as the first stage (process) of the motion vector detection, reduced screens are generated in each of a current picture and a reference picture for the motion vector detection and then motion vectors are detected between the reduced screens. Then, as the second stage detection between screens with normal resolution, the detection is executed by centering around the area which results from the first stage detection.
This method will be described referring to FIG. 6. For instance, assuming that with respect to a first stage detection area (a) of an input dynamic image of a normal screen size be, the reduced screen of 16.times.16 pixels obtained at the thinned-out ratio (1/16) of 1:4 in length to breadth be a process unit (b) of the first stage vector detection, a common offset vector (c) to 4.times.4=16 macro blocks (occasionally abbreviated as MB) is detected in the first stage vector detection.
Then, by each macro block (d) as a second stage detection unit a vector search for a second stage detection area (e) is executed to obtain a second vector (f). Both of the result vectors (c) and (f) at the first and the second stage are added to obtain a final result vector (g).
In this method, the detection accuracy is somewhat lower compared with that in a full search, while the operation amount which is required for the same area detection process is remarkably reduced.
One example of such a motion vector detection process which is realized in the form of hardware is shown in FIG. 7, in which a preprocessor 1 which executes a wave shaping of an inputted picture data, a frame memory 3 utilizing an SDRAM which can also be a DRAM and make a high-speed access with a large capacity, a motion vector detection processor 4, and an encoder 5 are mutually connected with a bus 10. The motion vector detection processor 4 is composed of a current picture RAM 41 which stores a current picture data, a reference picture RAM 42 which stores a reference picture data, an arithmetic portion 43 which performs predetermined operations for the data stored in the RAM's 41, 42 as local memories (cashes), and a determination portion (judging unit) 44 which determines or judge the vector resulting from of the operations to be sent to the encoder 5.
Namely, in each macro block, pixel data in rectangular areas of the current picture and the reference picture are read from the frame memory 3 in the local memories which are the internal current picture RAM 41 and the reference picture RAM 42. Then, the arithmetic portion 43 executes the motion vector detection process, and the motion vector thus obtained is determined at the determination portion 44 to be sent to the encoder 5 for the encoding operation.
A vector search area serves as a parameter which controls the encoding characteristics of the encoder. For the achievement of the widest area search it is necessary to read the reference picture areas in the widest area from the frame memory 3 in the RAM 42.
However, a bandwidth of the frame memory 3 (a data transfer performance) becomes a bottleneck, which limits the search area. Therefore, how efficiently screen data on the frame memory 3 can be accessed is important to make the vector search area wider and products more competitive.
In an arrangement of a hierarchical motion vector detection process such as shown in FIG. 7, a reduced screen has to be read from a frame memory in a local memory for the first stage process. Conventionally, from a normal-sized picture some pixels are thinned out in the frame memory and read in the local memory.
However, if the frame memory is an SDRAM, sequential address data of four or eight words are read out at a burst by only one readout command, so that reading out only the thinned-out pixels extremely worsens the memory access efficiency.
Even with a normal DRAM, if a bus width of the frame memory is larger (e.g. 32 bits) than 8 bits of the pixel data which are required to be read in, pixel data which are not required to read in single pixel data are also read in at the same time, which also worsens the efficiency. Moreover, there is a problem that a vector detection accuracy at the first stage decreases only by thinning out the normal-sized image due to the occurrence of a folded distortion.
To solve such a problem, an arrangement has been proposed, as shown in FIG. 8, which provides a filtering/thinning-out processor 2 for executing a space filtering and a pixel thinning out process between the RAM's 41,42 and the frame memory 3.
In this arrangement, when the reduced screen area (b) at the first stage detection in FIG. 6 is read in the RAM's 41, 42, sequential areas are once read in the filtering/thinning-out processor 2 from the frame memory 3, and then the pixels are thinned out while executing the filtering process.
At the following second stage detection, the normal-sized screen (a) in FIG. 6 is read in the RAM 42 by bypassing the filtering/thinning-out processor 2.
However, in the arrangement of FIG. 8, for reading in the reduced screens at the first stage detection, it is necessary to read in pixel data of area much wider than the size of the original and the reference picture which are originally required. For instance, in order to arrange the reduced screen thinned out at the ratio of 1:4 in length to breadth, 16 times the data amount is required because the surrounding pixels are necessary for the filtering process.
With this method, the problem of memory access efficiency is not only solved but also may be worsened.
Also, assuming that the first stage detection process be executed on the reduced screen thinned out at the ratio of 1:4 in length to breadth, the process unit at the first stage (a big macro block abbreviated as QMB) be 16.times.16 pixels on the reduced screen (i.e. 4 MB.times.4 MB=16 MB area in the normal-sized screen), and the search area be 16.times.16 pixels on the reduced screen, the screen size to be read in as the reference screen will be three times, i.e. 48.times.48, as shown in FIG. 9.
Namely, 2/3 of reference picture areas QMB3 and QMB4 read in the current picture areas QMB1 and QMB2 of adjacent big macro blocks are overlapped. Therefore, by storing only the overlapped areas in the RAM's 41, 42, only the remaining 1/3 area RR has to be read in separately.
A process schedule of the hierarchical motion vector detection will be described in the following. The encoding process by the encoder 5 is executed for each macro block. In the encoding order as shown in FIG. 10, the top rank of the screen is processed first and the following ranks are then processed to complete the bottom rank. In each rank the process is executed from left to right.
Accordingly, as shown in FIG. 11, vector values which are gained in the process units QMB1, QMB2, QMB3, and QMB4 at the first stage are used for the second stage process of four ranked macro blocks in the first process unit QMB, and are then saved in the RAM's 41, 42. Subsequently, the first stage process of the next right area will be executed in the same manner.
Specifically, as shown in FIG. 12, after the first stage process for the big macro block QMB1, the second stage process is executed within this big macro block QMB1 for four macro blocks MB1-MB4 per a macro block unit MB. Thus, the following motion vector detection processes are executed in the order of the big macro block QMB2+macro blocks MB5-MB8, and so on.
However, the problem is that the reference picture data at the first stage which is desired to be stored in the RAM's 41, 42 are overwritten and extinguished with the data at the second stage process because the processes at the first and the second stage are required to be alternately done. Namely, when the macro blocks in the next rank shown in FIG. 11 are processed after the second stage process for one rank of macro blocks MB1-MB18 has been executed, the first stage process must be again executed despite of the same first stage process being not required to be done essentially because the big macro blocks QMB1, QMB2, QMB3, and QMB4 have the same data.
Accordingly, every time the first stage process is executed all of the reference area data must be reread in, resulting in inefficient memory access. If the reference picture memories are provided separately for the first and the second stage process, the problem will be solved, which leads to an increase of a hardware scale and electric power consumption.
Thus, in the prior art, if the hierarchical motion vector detection method is realized in the form of hardware which uses the SDRAM as a frame memory, there is a problem that the access bandwidth of the SDRAM cannot be used efficiently to read out the reduced screen. Also, because of the process schedule in which the first and the second stage process are alternately executed, it is necessary to reread the area data of the overlapped reference picture at the first stage process every time, resulting in inefficient memory access.