The present invention relates to the processing of image signals. In particular it relates to the coding of both the interlace video and progressive video signals.
A new video source format referred to as progressive video has been proposed in industries such as those for package-type video contents, video devices, etc., in order to improve the picture quality. However, it became apparent that when progressive video is compressed using the present MPEG digital video compression standard, there is a problem with the inconsistency between the quality that is provided and the bit amount that is necessary. In the present invention, a solution is proposed that is very effective with regard to developing the MPEG-2 video compression standard.
Television signals are essentially the standardized display of optical images in a certain amount of time and vertical direction, with complete digital video signals being obtained by digitalizing the video for each scan line.
FIG. 1(a) shows a regular interlace video format and FIG. 1(b) shows a progressive video format provided with video of high quality and that does not have artifacts due to an interlace, such as a flicker, etc. Progressive video format is a basic approach for displaying optical images in which first of all the optical image is captured, the captured signals (two-dimensional signals) then being scanned with and made into a set of discrete scan lines. The time period for capturing the image is {fraction (1/60)} sec in countries (includes Japan and U.S.) that apply the NTSC television standard. Each image captured is thus referred to as a frame.
Analog television signals require a wide frequency band-width for broadcasting, so a means that is effective for reducing the analog bandwidth in half by interlacing the scan lines as shown in FIG. 1(a) has been considered. In this technology, each image that was captured is scanned with every other scan line and every other image is complementarily scanned. Each of this type of complementary image is referred to as a field. When interlace video and progressive video are compared, 1 frame of progressive video corresponds to 2 fields of interlace video.
When using interlace, it is necessary to forfeit some degree of quality regardless of the efficiency of the bandwidth. This is indicated best in the explanation for the frequency space of a sampling spectrum shown in FIG. 2. FIG. 2 shows the frequency space of progressive and interlace video spectra with the horizonal axis being the time frequency.
In (a), xe2x80x9cxxe2x80x9d indicates the repetitious frequency of the baseband spectrum in the case of progressive video. In a television of the NTSC system, 480 scanned images are shown on the monitor. The block applied with shading indicates the frequency xe2x80x9cwidthxe2x80x9d necessary for restricting the baseband spectrum in order to avoid the aliasing phenomenon that occurs with respect to repetitious spectrum generated in the xe2x80x9cxxe2x80x9d part.
(b) shows the baseband spectrum without aliashing with respect to interlace video. It is necessary to restrict the baseband spectrum to xc2xd of the progressive video due to excess repetitious frequency.
(c) shows another method for restricting the bandwidth to xc2xd with respect to interlace video. This baseband spectrum is preferable to (b) when there is a necessity to maintain a higher resolution during fast motion.
In the case of the interlace video, the bandwidth has to be restricted to xc2xd in comparison with progressive video in order to avoid the eiriashingu phenomenon. In the two methods shown in FIGS. 2(b) and (c), there are many cases of the method in (c) being preferable since the resolution is higher during fast motion. In the case of (b), the fine parts of the image are lost in an unfavorable form as the motion becomes greater and give an impression to the viewer of the image transiently becoming blurry due to the motion. In the case of (c), a higher resolution is maintained even in fast motion, but it is necessary to accordingly forfeit the resolution during standstill. When (a) and (c) are compared, it is readily apparent that progressive video can maintain a vertical frequency bandwidth that is twice that of the interlace video.
The fact that progressive video transmits video signals of (vertical) frequency with a wide bandwidth that is double that of interlace video was explained. However, the interlace video format has been used in ordinary televisions since the start of public broadcasting. The household television presently receives and displays said interlace television signals. On the other hand, most PC monitors display images in the progressive video format for conditions of very high resolution and quality. The problem of whether future televisions should use the progressive video format in order to gain high quality has been a major argument since digital television technology appeared.
MPEG-1 and MPEG-2 are presently global standards for broadcast and memory-use digital video compression. Below, the discussion will be restricted to basic P-frame coding for simplification. Extension to the B-frame is easy, so an explanation will be omitted.
In MPEG, interlace video is compressed due to the combination of motion estimation/compression and frequency-space coding that uses DCT (discrete cosine transformation). In order to use this method, the field sequence is paired as shown in FIG. 4. Each field is alternately scanned as shown in FIG. 1, so when two fields are pairedxe2x80x94which is composed for one to be the to-be-scanned image at the top in and the other to be the scan image at the bottomxe2x80x94it becomes the same number of scan lines as the entire frame as shown in FIG. 4. This is actually referred to as the frame of the interlace video.
Next, this frame is divided into blocks with the same dimensions of 16 lines (scan line)xc3x9716 picture elements. This is defined as the basic block dimensions with respect to DCT and motion estimation/compensation. The image signal of this unit is referred to as a micro block. If referred to in the original 2 fields, each micro block corresponds to a pair of field blocks with dimensions of 8 linesxc3x9716 picture elements as shown in FIG. 3. In FIG. 3, a micro block with dimensions of 16 linesxc3x9716 picture elements held within a frame and two corresponding field blocks, referred to as a top field block with dimensions of 8 linesxc3x9716 picture elements and a bottom field block with the same dimensions are shown.
In MPEG, motion estimation/compensation is basically executed between the fields and between the frames as shown in FIG. 4. First of all, the block that best matches within the field that was coded then decoded beforehand with respect to each field block is searched for as shown at the top of the figure. Each of the two field blocks thus has one motion vector. Motion estimation is executed even within the frame block (microblock) as indicated at the bottom half of FIG. 4. In this case, there is only one vector for each micro block.
After finding these two types of motion vectors, one of these two types of motion estimation is decided based on the condition that the difference between the motion estimation block signal and the coding signal is small. If it is apparent that the frame motion estimation like that shown at the bottom half of FIG. 4 is more effective, the frame motion estimation is selected with respect to the macroblock. Thereafter, the margin of the motion estimation difference signal is transformed into the coefficient of the frequency space using DCT.
When using DCT, a selection must be made between the DCT used in the frame block and the DCT used in the field block. DCT is used in a block with dimensions of 8 linesxc3x978 picture elements. Therefore, there are 4 blocks (with respect to the brightness component) in a macroblock with dimensions of 16 lines xc3x9716 picture elements. There are two methods for defining a block of 8 linesxc3x978 picture elements as shown in FIG. 5. It is generally known that the frame mode DCT shown in the left half of FIG. 5 is more effective in still video and the field mode DCT shown in the right half of FIG. 5 is more effective in a video with fast motion. The discrimination reference that is normally used to select from these two DCTs is applied to compare the high frequency energy created by these two methods.
There are two methods for using MPEG in progressive video.
(1) The first method is to resolve the progressive video into two affiliated interlace videos. At this time, the top field is extracted from progressive frame #N and the bottom field is extracted from progressive frame #N+1. Next, this interlace video is coded using MPEG. Thereafter, the affiliated interlace video (the bottom field from progressive frame #N and the top field from progressive frame #N+1) is coded. The method is shown in FIG. 6. MPEG is used for coding the interlace video sequence applied with shading, then the other interlace video sequence can be coded by applying DCT and using motion estimation in relation to said coded interlace video. This method is referred to as the scalable option of MPEG. The coded video information is composed of basic interlace video and an affiliate thereof, which is spatially added in order to expand said basic interlace video into complete progressive video. Therefore, this option can provide video to both the viewers of interlace video and viewers of progressive video. The viewers of interlace video reproduce the video from the basic coded part and discard the increased affiliated information.
There are two problems with this method. Firstly, the viewer of interlace video must pay for the increased affiliated information that said viewer does not watch. In MPEG, a common bit stream is distributed to each household when a scalable option is used regardless of whether the final user watches the video in the interlace format or watches it in the progressive format. Another problem is that the viewer of progressive video must bear the decrease in the compression efficiency. It is best to compress the progressive video with the progressive format. Namely, instead of resolving the progressive video into two interlace videos and introducing a scalable option as shown in FIG. 6, a higher compression efficiency is achieved by applying motion estimation/compensation directly to the progressive frame as shown at the bottom half of FIG. 4. This method, which codes the progressive video, is referred to as nonscalable progressive video coding in comparison with the coding using the scalable option. On the other hand, if the source video is provided in the interlace video format and the final video is displayed as interlace video, transmitting the interlace video in the progressive video format creates waste in the coding information. In actuality, the viewers of interlace video can receive an improved interlace video quality when the original source is coded with interlace video.
(2) Another method with respect to progressive video is that which is closer to the interlace video. This method must forego the transmission of complete progressive video, but it is that which attempts to interpolate and add omitted scan lines with respect to the transmitted interlace video in the final display. One method for this is to average the value of the picture element closest to the scan line as shown in FIG. 7. This method does not influence the quality of the coded interlace video so it is possible for the viewers of interlace video to receive the merit of high quality video. The problem is that the quality of the progressive video achieved on the receiving side by this method does not reach a satisfactory level when compared with the progressive video achieved by the method in (1).
FIG. 8 shows a typical simulation result in said two cases. The horizontal axis indicates the coding bit amount and the vertical axis indicates the signal to (coding) noise ratio (SNR). Here, a large numeric value indicates that the quality is high. The curved line in the middle indicates that SNR of nonscalable progressive video improves when the coding bit amount increases. The SNR of the scalable option coding is lower than this result, thus it is not shown in the figure. This curved line indicates the highest SNR when MPEG is applied to progressive source video.
The uppermost curved line indicates the SNR when the source is interlace video and when the coding is also executed with interlace video. The curved line in the middle correponds with progressive video having a broad source signal spectrum like what is shown in FIG. 2(a), but caution is necessary regarding the fact that the uppermost curved line corresponds with interlace video that needs to restrict the bandwidth in order to avoid the eiriashingu effect like what is shown in FIGS. 2(b) and (c). The interpolation of an omitted scan line as shown in FIG. 7 has a low pass filter function with respect to interlace video, and the spectrum shown in FIG. 2(a) approaches the spectrum shown in FIG. 2(c). The curved line at the bottom indicates SNR corresponding to the progressive video, namely, the SNR of progressive video formed with interpolation from the interlace video coded with the same bit amount.
Consequently, if source video is provided with the progressive format and if this is coded with the interlace video format, the SNR becomes higher (although it is accompanied by a noticeable aliashingu phenomenon) as interlace video but results in the SNR of the corresponding progressive video, obtained by interpolating interlace video, being very unfavorable compared to the original progressive video coded with nonscalable progressive video MPEG. To state more simply, it indicates that coding with nonscalable progressive MPEG is necessary in order to code the progressive source and to reproduce such in a satisfactory quality on the final progressive display device. Viewers of interlace video receive the merit of interlace video of higher quality obtained with the reproduced high-quality progressive video. The coding of the progressive video can thus guarantee a high-quality video service with respect to viewers of both interlace video and progressive video.
On the other hand, when the source video is provided in the interlace format having a correct band restriction as shown in FIG. 2(b) or (c), to transmit this, it is necessary to first interpolate the omitted scan lines then feed excess information thereafter for coding. The viewers of interlace video discard the increased scan lines during the display resulting in a great waste of information. As is apparent when the uppermost curved line and the middle curved line in FIG. 8 are compared, interlace video has a slightly better quality, so the conclusion can be made that if the original source video is the interlace format (having suitable band restrictions), and this is observed on an interlace video display device, it is preferable to code this in the interlace video format.
When the arguments above are summarized:
(1) When the source video is in the progressive format of a complete bandwidth, it is preferable to code with the progressive format regardless of whether this is dislpayed on a progressive video monitor or displayed on an interlace video monitor.
(2) When the source video is interlace video in which the the bandwidth is restricted, it is preferable to code in the interlace format.
(1) notes a positive merit for introducing progressive video. However, (2) cites a problem that needs to be solved for the progressive technology to be widely accepted by the industry. Namely, it means that the solution involves improving the nonscalable progressive MPEG in the coding performance with respect to the interlace video source. The present television is based on the interlace video technology. The development of a solution that can code the interlace source video for a greater performance level is needed.
The present invention relates to a progressive video coding technology that aims to solve the problem in (2) while maintaining the merit of progressive video coding noted in (1).
The present invention provides a coding method for image signals; it is a coding method for image signals and includes the steps of dividing the picture elements within 1 macroblock of input image signals into multiple groups, totalling each group as interlace video by executing DCT, comparing said totalled result with the result of the DCT executed with respect to the entire picture elements within said 1 macroblock, and selecting one of the DCT results.