The present invention relates to a method and apparatus for determining a target amount of code which may be used in a digital video disc (DVD) or the like or in a system for transmitting digital video broadcasting (DVB) digital data, and to a method and apparatus for compressing and coding noncompressed video data.
Techniques for reducing the amount of data during recording by converting video and audio signals to digital data and applying conversion and coding processing to the digital data have been standardized by MPEG (Moving Pictures Expert Group). Such processing may involve inter- or intra-coding of the video data for every macro block. In MPEG, a group of pictures (GOP) may include one frame or picture subjected to intra-coding (an I-picture), and a plurality of frames or pictures subjected to inter-coding which includes predictive coding (P-pictures) and bidirectional predictive coding (B-pictures).
In inter-coding processing, coding processing is applied to the difference between the video data of a current frame and the video data of a past frame. As is to be appreciated, the video data of the current frame may be easily obtained or restored if the video data of the past frame has been already sent.
In determining the above-described difference between frames of video data, motion detection and motion compensation processing may be performed.
Motion detection processing involves detecting or determining the position or macroblock in a past reference frame in which the sum of absolute values of differences of pixel values between such macroblock and the respective macroblock of the current frame is the smallest. Such detection may be performed by determining the number of pixels the macroblock needs to be moved from the position of the respective current frame macroblock in an X- and Y-direction such that the sum of absolute values of differences of the pixel values between the respective current frame macroblock and the reference frame macroblock is the smallest. Data representative of such amount of movement in units of pixels in the X- and Y-directions may be referred to as motion vector data.
Motion compensation processing involves determining a position based upon the motion vector data and the position of the past reference frame macroblock and extracting macroblock data pertaining to such position. Thereafter, the difference between the extracted past reference macroblock and the current respective macroblock, or the motion predictive error, is determined. The past reference frame may be obtained from a local decoder.
The motion predictive error signal may be subjected to orthogonal transformation, such as discrete cosine transformation (DCT) in units of predetermined blocks so as to remove the correlation in the space direction. From such DCT processing, transformed DCT coefficient(s) may be obtained which are subjected to quantization processing (that is, division by a quantization step) to eliminate fine signals whereafter the whole number value of such quotient and a quantization index are outputted. The quantized DCT coefficient(s), the quantized index, and the motion vector data may be variable length coded (VLC) and outputted.
In intra-coding processing, on the other hand, the above-described motion detection and motion compensation processing are not performed. Instead, pixel values are orthogonal transformed in block units, quantized, variable length coded, and outputted.
The above-described processing removes the redundancy in the time and space directions of the video data, thereby compressing the information data. Such compression enables image and audio data to be recorded on a single optical disk (such as a digital video disc-DVD), or a moving picture and audio data to be transmitted within the transmission line capacity of a telecommunications line or a satellite line.
Consider a situation wherein a DVD, a ROM disk, or the like is used. In such situation, the compressed and coded video data should be contained within the capacity of the disc. To accomplish such requirement, so-called variable rate coding may be utilized wherein the amounts of codes are allocated within the range of the amount of code so as to obtain the highest image quality. Such coding method utilizes or accounts for the fact that difficulty in image compression may fluctuate with time due to constant changes of the correlation strength of the video data in the time and space directions.
An example of the above-described fluctuation or change of correlation strength in the time direction will now be presented. A pattern of movement of a moving body in a moving picture may include not only simple parallel movement, but also complex movement patterns such as a change of movement speed, movement direction of the moving body, and shape of the moving body. In such a situation, the motion predictive error data may be increased by motion compensation in a manner as previously described. Alternatively, in moving picture data having little or no correlation in a time direction (such as random noise), the effect of compression by motion compensation may be negligible and motion predictive error data not much different in amount and/or data from that of the current frame may be generated. Such fluctuation in the difficulty of the image compression may exist not only with frame units, but also with other types of units such as GOP units, macroblock units, and so forth.
Accordingly, when compressing and coding data which fluctuates in difficulty (such as video data having a low correlation in the time and space directions) by a uniform bit rate (hereinafter, a "target code rate") per unit time, the DCT coefficient(s) may be quantized by a relatively large quantization step so that the generated code rate approaches the target code rate. As such, a relatively large quantization error may result and noticeable or conspicuous coding distortion may occur when expanding and decoding the data. On the other hand, when compressing and coding data having a low fluctuation in difficulty (such as video data having a high correlation in the time and space directions), the DCT coefficient (s) may be concentrated at the low frequency component side so as to finely quantize the same by a relatively small quantization step so that the generated code rate approaches the target code rate or target code amount. As a result, distortion after expansion and decoding of the data may be reduced.
Therefore, in recording on DVDs or the like, to avoid coding distortion due to image fluctuation over time and the deterioration of the image quality, variable length coding using a so-called 2 path or 2-pass encoding technique may be utilized. In such 2-pass encoding technique, video data is compressed and coded by a fixed quantization step, and the amount of codes generated is defined as the difficulty data and stored in a first pass or path, and, during a second path or pass, compression and coding are performed based on the difficulty data such that a relatively large amount of code may be allocated to difficult video data and a relatively small amount of code may be allocated to non-difficult video data. (The term "difficult data" refers to the amount of data necessary to obtain a certain constant image quality.)
In the above-described 2-pass encoding technique, the actual encoding (second pass) commences after encoding all of the video data in the first pass. As a result, it is very difficult, if not impossible, to perform such technique on supplied video and audio data with no interruption(s). Due to such limitation, such 2-pass encoding technique may not be suitable for encoding at the time of communication and broadcasts in which relatively long data streams should be encoded without interruption and in real time.
Alternatively, a 2-path or 2-pass technique may be utilized which uses two encoders, that is, a first encoder for encoding data passing through a memory having an FIFO (first-in first-out) configuration and a second encoder for encoding the data not passing through such memory. Hereinafter, this 2-pass encoding technique will be referred to as a "simplified 2-pass encoding" technique to distinguish it from the above 2-pass encoding technique. In the simplified 2-pass encoding technique, difficulty information of the received data is obtained in the first path or pass and the data is encoded using the obtained difficulty information in the next path or pass. Two types of simplified 2-pass encoding techniques may be utilized. In one type, a target amount of bits of each GOP is determined based on difficulty information of several GOPs. This type may be considered rate control in units of GOPs. The other type controls the target amount of bits in each picture unit in the GOP and may be considered rate control in units of pictures.
In the simplified 2-pass encoding, difficulty information may be obtained from a number (K) of GOPs and utilized for such GOPs. As such, information pertaining to patterns before and after the K number of GOPs may not be utilized in obtaining the difficulty information for any of such K GOPs. However, utilizing such obtained difficulty information, without using the information of the patterns before and after the K GOPs, may present a problem upon encoding the data. Such problem will be explained hereinbelow with reference to FIGS. 5A to 5D.
Assume that the difficulty of input materials or data is as shown in FIG. 5A. (In FIG. 5A, the ordinate axis represents the difficulty, and the abscissa axis represent time in GOP units. Further, in FIG. 5, the amount or length of the difficulty data which may be obtained in advance, that is, the capacity of the FIFO memory, is 2 GOPs.) As shown in FIG. 5A, the difficulty of the input data gradually rises to a high value in the period of the first two GOPs, falls to a low value at the start of the period of the next two GOPs and remains at such low value for the remainder of this period, rises to a high value after entering the period of the next two GOPs and gradually falls to a low value thereafter. (In FIG. 5, the difference of difficulty according to picture type is ignored for simplification.) Ideally, input data having a difficulty distribution as shown in FIG. 5A should have an amount of bits allocated in proportion to such difficulty distribution as shown in FIG. 5B. Such ideal bit amount distribution provides equitable arrangement in that it enables a large amount of bits to be used for a difficult pattern and a small amount of bits to be used for a simple pattern. Further, such distribution may enable encoding to be performed without deteriorating (or without significant deterioration of) the image quality. (As a reference, the average amount or rate is shown in FIG. 5B by a broken line.)
FIG. 5C illustrates an allocation of the amount of bits in GOP units obtained by the simplified 2-pass encoding. As shown therein, such bit allocation is flat or constant within each GOP unit and is not proportional to the difficulty distribution shown in FIG. 5A. Such flat or constant allocation in each GOP is the result of allocating the amount of bits in units of GOPs.
FIG. 5D illustrates an allocation of the amount of bits in picture units obtained by the simplified 2-pass encoding. Since the allocation of the amount of bits is performed in units of pictures, such bit allocation is proportional to the difficulty distribution shown in FIG. 5A. However, such bit allocation of FIG. 5D is smaller in several portions thereof (that is, the first and last third) than that shown in FIG. 5B. Such difference in bit allocation is a result of allocating the amount of bits in units of two GOPs. That is, if the allocation of the amount of bits is determined in units of two GOPs, the amount of allocated bits may be insufficient within two GOPs having a difficult pattern or wherein a difficult pattern continues, and the amount of allocated bits may be excessive within two GOPs having a simple pattern or wherein a simple pattern continues.
Therefore, the above-described simplified 2-pass encoding technique may not provide a proper allocation of bits.
Further, when compressing and coding noncompressed digital video data by the method of the MPEG (Moving Picture Experts Group) or the like and recording the same on a recording medium such as a magneto-optical disc (MO disc), it is necessary to reduce the amount of data (bit amount) of the compressed video data after the compression and coding to less than the recording capacity of the recording medium while enhancing the quality of the video after expansion and decoding as much as possible.
In order to satisfy this need, there has been adopted a method of first preliminarily compressing and coding the noncompressed video data and estimating the amount of data after the compression and coding (first path), then adjusting the compression rate based on the estimated amount of data and carrying out the compression and coding so that the amount of data after the compression and coding becomes less than the recording capacity of the recording medium (second path) (hereinafter, such a compressing and coding method will be described as "two-path encoding" too).
When carrying out the compression and coding by the two-path encoding, however, it is necessary to apply similar compressing and coding processing two times with respect to the same noncompressed video data, so a long time is taken. Further, since the final compressed video data cannot be generated by one compressing and coding processing, it is not possible to compress and code and record the captured video data in real time as it is.
Furthermore, when compressing and coding noncompressed digital video data in units of GOPs (groups of pictures) comprised by I-pictures (intra-coded pictures), B-pictures (bi-directionally coded pictures), and P-pictures (predictive coded pictures) by the method of the MPEG (Moving Picture Experts Group) or the like and recording the same on a recording medium such as a magneto-optic disc (MO disc), it is necessary to reduce the amount of data (amount of bits) of the compressed video data after the compression and coding to less than the recording capacity of the recording medium or less than the transmission capacity of the communication line while maintaining a high quality of the video after expansion and decoding.
For this purpose, there is adopted a method in which the noncompressed video data is first preliminarily compressed and coded and the amount of data after the compression and coding is estimated (first path), then a compression rate is adjusted based on the estimated amount of data and the compression and coding are carried out so that the amount of data after the compression and coding becomes less than the recording capacity of the recording medium (second path) (hereinafter, such a compressing and coding method will be also referred to as "two-path encoding").
When carrying out the compression and coding by the two-path encoding, however, it is necessary to apply similar compressing and coding processing two times with respect to the same noncompressed video data, so a long time is taken. Further, since the final compressed video data cannot be generated by one compressing and coding processing, it is not possible to compress and code and record the captured video data in real time as it is.
Further, when a plurality of series of noncompressed video data (hereinafter, also referred to as scenes) not correlated in the time direction are continuously connected to obtain one series of noncompressed video data by edit processing (edited video (Y data) and this edited video data is compressed and coded by for example a picture type sequence I, B, P, B, P, B, P, B, P, B, P, B, the first picture after the compression and coding sometimes becomes the P-picture. In order to expand and decode this first P-picture, it is necessary to refer to the picture immediately before the compressed video data generated from the other scene. When a picture generated from another scene not having correlation is used for the expansion and decoding of the first P-picture, however, the motion prediction error is considerably increased and therefore an enormous amount of data becomes necessary. Where only a limited amount of data can be used, the video after the expansion and decoding is deteriorated.
Japanese Unexamined Patent Publication No. 7-193818 discloses an image processing method and an image processing apparatus to solve such a problem. In the image processing method and the image processing apparatus disclosed in Japanese Unexamined Patent Publication No. 7-193818, when the noncompressed edited video data containing for example two scenes (a first scene and a second scene) is compressed and coded by for example the above picture type sequence I, B, P, B, P, B, P, B, P, B, P, B, the compression and coding are carried out by changing the leading P-picture of the second compressed video data (I.sub.2, B.sub.2, P.sub.2 in the following picture type sequence) obtained by compressing and coding the second scene to the I-picture not referring to the last picture of the first compressed video data (I.sub.1, B.sub.1, P.sub.1 in the following picture type sequence) obtained by compressing and coding the first scene and further changing the last I-picture of the first compressed video data to the P-picture so as to suppress an increase of the amount of the data generated.
That is, specifically, the image processing method and the image processing apparatus disclosed in Japanese Unexamined Patent Publication No. 7-193818 are constituted so that when the compression and coding are carried out without a change of the picture type sequence and the first compressed video data and the second compressed video data are obtained by a picture type sequence B.sub.1, I.sub.1, B.sub.1, P.sub.1, B.sub.1, P.sub.1, B.sub.1, P.sub.2, B.sub.2, P.sub.2, B.sub.2, P.sub.2, B.sub.2, the last I-picture of the first compressed video data is changed to the P-picture, and further the first P-picture of the second compressed video data is changed to the I-picture to perform the compression and coding and thereby obtain the first compressed video data and the second compressed video data of the picture type sequence B.sub.1, P.sub.1, B.sub.1, P.sub.1, B.sub.1, P.sub.1, B.sub.1, I.sub.2, B.sub.2, P.sub.2, B.sub.2, P.sub.2, B.sub.2.