In encode (encoding) system for implementing compression encoding processing to video information in storing (accumulating) video information onto package media such as DVD (Digital Versatile Disk, Digital Video Disk) or Video CD, there is popularly (generally) employed an encoding method for first measuring encoding difficulty of picture of video material thereafter to carry out Bit Assign (Assignment) processing every frames of respective video information so as to fall within given number of bytes within the range of recording capacity of package media on the basis of that encoding difficulty. This encoding method will be referred to as 2-pass encoding method.
FIG. 1 shows an example of the configuration of conventional video encoding system used for the purpose of compression-encoding video information to carry out authoring of DVD, etc.
A supervisor 103 carries out management of the entirety of the video encoding system, and serves to give encoding condition to respective encoding systems such as video, audio or menu, etc. to receive notification of encode result. In this example, there are notified (informed), from the video encoder side, address "v.adr" on RAID (Redundant Arrays of Inexpensive Disks) 104 on which bit stream of encode result is written and data "vxxx.aui" necessary for multiplexing bit stream.
A main controller 111 serves to control the operation of the entirety of this video encoding system by data communication between the main controller 111 and the supervisor 103 connected through a network 102.
In more practical sense, the main controller 111 accepts control from the supervisor 103 and accepts operation by operator by management of a Graphical User Interface (GUI) section 114 and to control the operation of an encoder 112 and a Video Tape Recorder (VTR) 110 by a bit assign section 115, an encoder control section 116 and a VTR control section 117 which are caused to undergo management by the above-mentioned GUI section 114. Thus, the main controller 111 carries out encoding processing of material to be processed in accordance with encode condition notified from the supervisor 103 to notify its processing result to the supervisor 103. Further, the main controller 111 can accept setting by operator through the GUI section 114 to change (alter) detailed condition of the above-mentioned encoding.
The GUI section 114 of the main controller 111 carries out management of three programs of bit assign program "BIT_ASSIGN" of the bit assign section 115, encoder control program "CTRL_ENC" of the encoder control section 116 and VTR control program of the VTR control section 117.
Moreover, the bit assign section 115 determines, in frame units, the condition of encoding processing in accordance with file "v.enc" of the encoding condition notified from the supervisor 103 to notify control data by the above-mentioned condition to the control section 116 by file form "CTL file".
At this time, the bit assign section 115 sets bit assign (assignment) in the encoding processing to further change (alter) the set condition in accordance with operation by operator. When data-compressed video data D2 is recorded onto the RAID 104, the bit assign section 115 notifies, to the supervisor 103, data "v.adr" of address on the RAID 104 where that video data D2 is written along with information "vxxx.aui" such as data quantity, etc. necessary for multiplexing processing at the succeeding stage.
The encoder control section 116 controls the operation of the encoder 112 in accordance with control file "C.L. file" notified from the bit assign section 115. Further, the encoder control section 116 notifies, to the bit assign section 115, in frame units, data of encoding difficulty "difficulty" required for encoding processing. When video data D2 is recorded onto the RAID 104, the encoder control section 116 notifies, to the bit assign section 115, data "V.ADM." of that recording address and data "vxxx.aui" necessary for later multiplexing processing.
The VTR control section 117 controls the operation of the Video Tape Recorder (VTR) 110 in accordance with editing list notified from the supervisor 103 to reproduce desired material to be edited.
The video Tape Recorder (VTR) 110 reproduces video data D1 recorded on the magnetic tape in accordance with editing list notified from the supervisor 103 through the main controller 111 to output it to the encoder 112.
The encoder 112 switches the operation in accordance with the condition notified through the main controller 111 from the supervisor 103 to compression-encode the video data D1 outputted from the VTR 110 by the technique of MEG (Moving Picture Experts Group).
At this time, the encoder 112 notifies result of encoding processing to the main controller 111, and the main controller 111 controls the condition of encoding in that data compression to control quantity of bits generated. Thus, the main controller 111 can grasp, in frame units, quantity of bits generated by data compression.
Moreover, at the time of processing of encode condition setting in advance in the 2-pass encode operation (at the time of provisional encoding or pre-encoding), the encoder 112 merely data-compresses video data to only notify processing result to the main controller 111. On the other hand, at the time of final data compression processing (at the time of main encoding), the encoder 112 records the compression-processed video data D2 onto the RAID 104 to further notify, to the main controller 111, address where that data is recorded, data quantity, etc.
A monitor unit 113 is caused to be of configuration capable of monitoring video data D2 which has been data-compressed by the encoder 112. By this monitor unit 113, in this video encoding system, operator can carry out the so-called preview to confirm result of data compression processing as occasion demands. In addition, operator can operate the main controller 111 on the basis of this preview result to change (alter) the condition of encoding in detail.
As described above, in the DID, as compression system for video data, there is employed MEG (Moving Picture Experts Group) system.
The MEG is the system of removing redundancy in the time direction by motion compensation prediction to thereby carry out data compression, and there are used three kinds of encoded pictures of I (Intra) picture encoded only within frame, P (Predictive) picture encoded by predicting current (present) picture (frame) from past picture (frame) and B (Bidirectionally Predictive) picture encoded by current (present) picture (frame) from past picture (frame) and future picture (frame).
Moreover, these pictures are caused to be of GOP (Group of Pictures) which are set (group) necessarily including one I picture.
FIG. 2 shows an example of GOP structure.
In this example, the number N of pictures (frames) constituting one GOP is 15. The order where respective pictures of GOP are displayed is different from the order where they are encoded. The leading picture of GOP in the display order is B picture which is before I picture and next to P picture or I picture. In addition, the last picture of the GOP in the display order is the first P picture which exists before next I picture.
The 2-pass encoding operation will now be described with reference to the configuration of the video encoding system illustrated in FIG. 1.
FIG. 3 shows the fundamental processing procedure of 2-pass encoding operation in the video encoding system.
Initially, at step S51, there is given encode condition "Vance" such as total quantity of bits or maximum rate, etc. assigned to video information via the network 102 from the supervisor 103. Thus, the encoder control section 116 is set in accordance with this encode condition.
Subsequently, at step S52, the encoder control section 116 measures encoding difficulty of encode material (material to be encoded) by using the encoder 112. In this case, DC value of each pixel of material and/or motion vector quantity ME thereof are measured together. Thus, file is prepared on the basis of these measured results.
More practical measurement of encoding difficulty will be carried out below.
Video information which is encode material is reproduced by the VTR 110 from digital video cassette which is master tape.
The encoder control section 116 measures encoding difficulty of video information D1 reproduced by the VTR 110 through the encoder 112. In this instance, in encoding, the number of quantization steps is set to fixed value so that quantity of bits generated is measured. In pictures in which there are many motions (movements) and there are thus many high frequency components, quantity of bits generated is large. In still pictures or pictures in which there are many flat portions, quantity of bits generated is small. Magnitude of quantity of bits generated is caused to be encoding difficulty.
Subsequently, at step S53, the encoder control section 116 executes bit assignment calculation program "BIT_ASSIGN" within bit assign section 115 in accordance with degree of encoding difficulties of respective pictures measured at the step S52 by the encoding condition set at the step S51 to carry out assignment calculation of quantity of bits to be assigned (target quantity).
Then, at step S54, result of the bit assignment calculation is used to carry out provisional encode (pre-encode) to allow operator to judge by picture quality of output of local decoder included within the encoder 112 whether or not main encoding operation is executed.
In more practical sense, picture quality is confirmed in the Preview mode which is the mode where operator can designate arbitrary processing range without outputting bit stream by the bit assignment to the RAID 104.
Then, at step S55, picture quality evaluation is carried out. In the case where picture quality is of question (NG), the processing operation proceeds to step S56, at which such customize work for picture quality adjustment to increase bit rate of the portion of question or to adjust filter level is carried out. Thereafter, at step S57, bit assignment re-calculation is executed.
Thereafter, the processing operation returns to the step S54. At this step, the customized portion is previewed. Thus, picture quality is confirmed at the step S55. In this case, if picture qualities of all portions are satisfactory, the processing operation proceeds to step S58. By the encoder 112, there is executed encoding operation with respect to the entirety of material by bit assignment re-calculated at the step S57.
On the other hand, in the case where it is judged at the step S55 that picture quality is of no question, the processing operation proceeds to step S58 as it is. By the encoder 112, encode processing by bit assignment calculated at the step S53 is executed.
Then, at step S59, post-processing such that bit stream which is encode result is written onto the RAID 104 through SCSI (Small Computer System interface), or the like is carried out. Two 2-pass encoding processing is thus completed.
After execution of encoding processing at the step S58, the video encoder control section 116 notifies information of encode result as described above to the supervisor 103 via the network 102.
In this example, processing of respective steps except for the step S52, the step S54 and the step S58 of the respective steps of FIG. 3 are carried out by off line.
Explanation will be further given in connection with bit assignment calculation executed at the bit assign section 115 in the above-described 2-pass encoding processing.
FIGS. 4A to 4G show an example of processing of remainder bits in the bit assignment calculation.
Initially, total quantity of bits "QLTY._BYTES" (FIG. 4A) and maximum bit rate (MISREAD)" assigned to video information of recording capacity of package medium such as Digital Video Disk (DID) etc. are designated from the supervisor 103.
On the other hand, the encoder control section 116 executes bit assignment calculation program "BIT_ASSIGN" within the bit assign section 115 to first determine total number of bits "USB_BYTES" (FIG. 4B) limited so that bit rate is caused to be less than the maximum bit rate "MISREAD" (FIG. 4B) to calculate "SUPPLY_BYTES" (FIG. 4C) which is target value of sum total of the number of targets from value obtained by subtracting number of bits "TOTAL_HEADER" necessary for Header of GOP from the above-mentioned value and total number of frames of the entirety of material.
Then, assignment bit quantities (target quantities) to respective pictures are assigned so as to fall within size of the "SUPPLY_BYTES". Assuming that sum total of assignment bit quantities to all pictures is "TARGET_BYTES", value obtained by subtracting the above-mentioned "TARGET_BYTES" from target value "SUPPLY_BYTES" of sum total of the number of targets is quantity "REMAIN_BYTES" indicating remainder (Remain) in bit assignment as shown in FIGS. 4D to 4F.
In addition, as shown in FIG. 4G, sum of "TARGET_BYTES" and Header is caused to be "TARGET_OUT_BYTES".
FIG. 5 shows a more practical example of procedure of bit assignment calculation processing at the step S53 of FIG. 3.
Initially, at step S61, as previously described, total quantity of bits "QTY_BYTES" and maximum bit rate "MISREAD" which are sent from the supervisor 3 are inputted.
Subsequently, at step S62, file of measured result of encoding difficulty (Difficulty) prepared at the step S52 of FIG. 3 is read in as it is.
Then, at step S63, point where scene changes is detected from change quantity of parameter of magnitude of DC value or motion vector quantity ME of respective pictures measured along with encoding difficulty.
It is to be noted that processing for detecting scene change point in the "Video Signal Processing Apparatus" that the applicant of this application has already disclosed in the specification and the drawings of the Japanese Patent Application No. 274094/1996 can be applied to detection/processing of scene change at the step S63. This "Video Signal Processing Apparatus" is adapted for detecting D.C. levels of respective frames of video signal to detect frame of scene change of that video signal from error value obtained by carrying out curve approximation of these D.C. levels to make clear scene change point. Namely, at the detected point where scene has changed, P picture is changed into I picture. Thus, improvement in the picture quality is realized.
Then, at step S64, CHAPTER boundary processing is carried out. At the time of chapter search at the DID reproducing apparatus, reproduced picture jumps from non-specified picture. In order that reproduced picture is not disturbed even in such a case, picture type is changed or GOP length is limited so that position of chapter is necessarily located at the leading position (portion) of GOP by this chapter boundary processing.
At step S65, interpolation/correction is carried out with respect to value of encoding difficulty (Difficulty) caused to be in correspondence with picture types such as I picture, P picture, B picture, etc. which have been changed as the result of a series of works (operations).
The reason why such an approach is employed is as follows. Since the maximum number of fields displayed at the time of decoding of 1 GOP is limited, there are instances where length of 1 GOP may be above this limit as the result of the fact that GOP structure is changed followed by change of picture type. In such a case, there is carried out GOP restriction (limiting) processing in which the P picture is changed into the I picture so that GOP length is shortened.
At step S66, the numbers of Target bits every respective pictures are calculated in accordance with encoding difficulty obtained by interpolation/correction processing at the step S65 and the number of bits "SUPPLY_BYTES" given with respect to the entirety of material to be encoded.
Then, at step S67, address (ADDRESS) of the RAID4 in writing thereon to bit stream of encode result is calculated. Thereafter, the processing operation proceeds to step S68, at which control file for encoder is prepared. Thus, the processing is completed.
By the procedure as stated above, the numbers of target bits every respective pictures are calculated in accordance with encoding difficulty (Difficulty) of material and the number of bits "SUPPLY_BYTES" given with respect to the entirety of material. Thus, control file for encoder is prepared.
Explanation is given in more detail in connection with such procedure of a series of bit assignments. Here, as an example of calculation of bit assignment, it is assumed that bit quantity is first assigned with GOP being as unit, and bit assignments corresponding encoding difficulties (Difficulty) of respective pictures within respective GOES are then carried out. In this case bit assignment quantity of GOP unit "gop_target" at the time of encoding is assigned in accordance with sum of encoding difficulties every respective GOES "gop_difficulty".
FIG. 6 shows an example of the simplest function for converting sum of encoding difficulties every GOES "gop_diff" and bit assignment quantity of GOP unit "gop_target" at the time of encoding.
In this example, there is used evaluation function expressed by form indicated below where "gop_target" is assumed to be Y and "gop_diff" is assumed to be X. EQU Y=AX+B
Total number of bits "USB_BYTES" limited so that bit rate is less than allowed maximum bit rate is given by the following relational expression. EQU USB_BYTES=min (QTY_BYTES-MISREAD.times.KT.times.total_frame_number) (1)
In the above relational expression,
KT=1/8 (bits)/30 (Hz) in the case of the NTSC system, and PA1 KT=1/8 (bits)/25 (Hz) in the case of the PAL system. PA1 (1.ltoreq.K.ltoreq.number of pictures within GOP) PA1 0: bottom_field_first PA1 1: bottom_field_first, repeat_first_field PA1 2: top field_first PA1 3: top field_first, repeat_first_field PA1 (limited to the case where p_mode[k] is 0 or 2)
Moreover, "total_frame_number" is total number of frames of material to be encoded, and min (s, t) is function for selecting smaller one of s and t. In addition, "DIFFICULTY_SUM" is sum total of encoding difficulties of all pictures. EQU SUPPLY_BYTES=USB_BYTES-TOTAL_HEADER (2) EQU DIFFICULTY_SUM=.SIGMA.difficulty (3) EQU B=GOP_MINBYTES (4) EQU .SIGMA.y=A.times..SIGMA..times.+B.times.n
In the above-mentioned relational expression, .SIGMA.y=SUPPLY_BYTES, .SIGMA..times.=DIFFICULTY_SUM and n is total number of GOES. Thus, EQU A=(SUPPLY_BYTES-B.times.n)/DIFFICULTY_SUM
Accordingly, target quantity of each picture is given as below. EQU gop_target=A.times.gop_diff+B (5)
Thereafter, bit assignments corresponding to encoding difficulties of respective pictures are carried out within respective GOES. In the case where assignment of each picture within GOP is caused to be proportional to magnitude of encoding difficulty, target quantity of each picture is determined by the following relational expression. EQU target (k)=GOP_TARGET.times.difficulty (k)/GOP_DIFF (6)
Further, after such bit assignment calculation, there is set address of the RAID 4 onto which bit stream of encode result is written. Thus, control file for encoder is outputted. By carrying out encode processing by the control file prepared in this way, variable bit rate encoding corresponding to difficulty of picture of material is executed.
The outline of the 2-pass variable bit rate encoding has been described above.
Explanation will now be given in connection with pull-down (Piltdown) of cinema (film) material.
In order to convert cinema film constituted at 24 frames/sec. into television/video signal of the NTSC system constituted at 30 frames/sec., processing to periodically repeat the same field picture is carried out. Hereinafter, this processing is referred to as 2-3 pull-down conversion.
FIG. 7 shows the principle of this 2-3 pull-down conversion.
Phase of pattern of pull-down is determined at the time of conversion from film material to video material of the NTSC system. In many cases, pattern is regularly converted.
1 frame of video material consists of two fields, wherein the first field (1st field) thereof is assumed to be top field (top_field) and the second field (2nd field) is assumed to be bottom field (bottom_field). In addition, location where the same field picture is repeated is called repeat first field (repeat_first_field).
In such cinema material, if the position where the same field is repeated is known, processing is carried out at the time of encoding so that corresponding field is not encoded to enhance compression efficiency.
As combination of 2-3 pull-down patterns in encoding, there are the following four patterns.
Here, combination of these patterns is defined as "picture mode".
In the case where encoding is carried out simultaneously with pull-down conversion, or in the case where information indicating the pull-down pattern, etc. is given, optimum encoding in which pull-down processing as described above is taken into consideration, i.e., encoding such that encoding of picture is not carried out with respect to fields repeated to output only information indicating field to be repeated can be carried out. However, in the case where such information is not given, there are instances where encoding in consideration of pull-down processing cannot be suitably carried out.
Explanation will now be given in connection with the relationship between 2-3 pull-down pattern and video frame number k from the leading portion of roll of material in accordance with the NTSC system.
Assuming that the above-described picture mode is p_mode[k] with respect to the relationship between position including top field which is not repeated and frame No. k, there exist frame numbers which do not belong to the above-mentioned picture modes of 0 to 3 also as seen from FIG. 7. Value of p_mode[k] in this case is assumed to be 4.
FIG. 8 is state transition diagram of p_mode [k] in the pull-down.
In the case where pull down patterns are regularly successive, value of p_mode[k] increases one by one in the field of remainder divided by 5 (mod 5) with increase of frame number k. EQU p_mode[k+1]=(p_mode[k]+1) mod 5 (7)
In the case where the pull down pattern is disturbed, only when p_mode is 0 or 2, that value can be repeated. EQU p_mode[k+1]=(P_mode[k]+1) mod 5 (8)
In the 2-pass encoding, in carrying out measurement of encoding difficulty, pull-down pattern is automatically detected. At this time, bit assignment is carried out on the basis of the measured pull-down pattern. Thus, control file is prepared.
Further, at the time of final (last) encoding, encoding is carried out in accordance with pull-down pattern described in the control file.
However, in the automatic detection technique for pull-down pattern as described above, there are instances where pull-down pattern cannot be correctly detected in the case where material is similar to still picture as the result of the fact that detection is made in principle on the basis of each difference between top field and bottom field of current frame and frame early by one frame.
For example, when consideration is made in connection with the first (initial) portion of cinema (movie), image beginning from the first title scene fades in from black, logo of cinema company appears, and such image fades out to black. In such case, with respect to the portion where motion (movement) is small like fade in/fade out to black, precise detection of pull-down phase is difficult. As a result, there are many instances where detection is erroneously made.
Further, in the material using old unit (apparatus) using image pick-up tube as converter for carrying out conversion from film material to video material, there are instances where pull-down phase cannot be precisely detected by after-image between frames.
For the reason similar to the above, also in the material using noise reducer such that picture of early frame and picture of current frame are added to reduce random noise, there are instances where pull-down phase cannot be correctly detected.
In addition, in material such that a large number of film materials different in pull-down phase are edited, there is the possibility that pull-down phase may be detected as erroneous pull-down phase from delay of detection of pull-down pattern at the editing point.
As stated above, when encoding is carried out at erroneous phase, since processing is made by erroneous pull-down pattern for a time period until phase is locked with respect to correct phase, there is the problem that motion (movement) of picture of encoded result becomes awkward, or the like.
Whether or not there is the problem even when encoding is executed by pull-down processing greatly depends upon stability of pull-down pattern of material.
However, since it is difficult to carry out such judgment in advance, operator can only judge reasonableness while observing motion (movement) of picture after encoding. In the case where it is judged that the encoding condition is not suitable, there was the problem that the condition is changed to carry out encoding for a second time from the first.