Since a very large amount of data is required to represent a motion picture digitally, the digital video signal representing the motion picture is conventionally compressed using a high-efficiency compression process to enable the digital video signal to be transmitted, distributed, or stored using significantly less data. FIGS. 1 and 2 respectively show the construction of a known recording apparatus for recording a digital video signal representing a motion picture and of a known reproducing apparatus for reproducing a compressed digital video signal in which the digital video signal is compressed before recording, and the compressed digital video signal is expanded after reproduction.
Specifically, in the recording apparatus 1 shown in FIG. 1 for recording a digital video signal, the analog video signal S1 from a video signal source, such as the video camera (VID CAM) 2, is converted into a digital video signal by the analog-to-digital converter (A/D) 3. The resulting digital video signal D1 is fed into the encoder (ENCODE) 4, where it is compressed. The error correction circuit (ECC) 5 adds error correcting codes to the compressed digital video signal D2 from the encoder 4, and the modulation circuit (MOD) 6 modulates the resulting signal using a predetermined modulating method. The recording signal S2 from the modulation circuit is recorded on the recording medium 7, which is, for example, an optical disk.
In the reproducing apparatus 8 shown in FIG. 2 for reproducing a compressed digital video signal, the signal S3 reproduced from the recording medium 7 is demodulated by the demodulation circuit (DEMOD) 9. The error correcting circuit (ECC) 10 subjects the demodulated signal to error detection and correction to produce the compressed digital video signal D3. The decoder (DECODE) 11 expands the compressed digital video signal D3 from the error correction circuit 10 to produce the digital video output signal D4. The digital-to-analog converter (D/A) 12 converts the digital video output signal D4 to an analog signal for delivery as the analog video signal S4 to the monitor (TV MONI) 13 or the like for display. Alternatively, the digital video output signal D4 can be delivered to the monitor 13 directly.
FIG. 3 shows the construction of the encoder 4 of the recording apparatus 1 in detail. The encoder 4 receives the digital video signal D1 and stores it in the frame memory (FRM MEM) 20, which consists of a random-access memory (RAM). The digital video signal is read out from the frame memory 20 at a predetermined timing and is fed via the block dividing circuit 21 to the subtractor 22 and one pole of the switch 32. The other pole of the switch 32 is connected to the output of the subtractor 22. The wiper of the switch 32 is connected and the subtractor 22 to the orthogonal transform circuit 23, which is, for example, a discrete cosine transform (DCT) circuit. Depending on the state of the switch 32, the orthogonal transform circuit 23 orthogonally transforms a block of the digital video signal D1 or a block of differences between a block of the digital video signal and a corresponding reference block. The orthogonally transform circuit applies, for example, a discrete cosine transform (DCT). The resulting transform coefficients are quantized by the quantizing circuit (Q) 24. The variable length coding circuit (VLC) 25 codes the quantized transform coefficients using variable length coding such as Huffman coding. The resulting digital video data DO are fed to the video output buffer 26, where they are temporarily stored.
Each picture (i.e., each frame or each field) of the digital video signal may be coded using intra-picture coding or inter-picture coding. A picture coded using intra-picture coding (called an I-picture) is coded by itself, without reference to any other picture. When a picture is coded as an I-picture, the switch 32 feeds each picture block of the picture directly to the orthogonal transform circuit 23.
A picture coded using inter-picture coding (called a P-picture or a B-picture) is coded with reference to a reference picture, which is derived from one or more previous or following pictures. When a picture is coded using inter-picture coding, the subtractor 22 generates blocks of differences between blocks of the picture and corresponding blocks of the reference picture, and passes each block of differences via the switch 32 to the orthogonal transform circuit 23 for coding.
The reference picture with respect to which the picture is coded is derived from reconstructed I-pictures and P-pictures stored in the frame memory 20 as follows: a P-picture is coded with forward prediction using as its reference picture a temporally-preceding I-picture or P-picture. A B-picture is coded with bi-directional prediction using as its reference picture one of the following three types of pictures: a temporally-preceding I-picture or P-picture; a temporally-following I-picture or P-picture; or a picture formed by interpolation between a temporally-preceding I-picture or P-picture and a temporally-following I-picture or P-picture.
The reconstruction of the reconstructed I-pictures and P-pictures stored in the frame memory 20 will now be described. The block of quantized transform coefficients derived from each block of each I-picture or each P-picture is fed from the quantizing circuit 24 to the local decoder 33. The local decoder is constituted by the inverse quantizer 27, the inverse orthogonal transform circuit 28, and the adder 29. The local decoder 33 decodes each block of quantized transform coefficients to provide a block of a reconstructed picture. The block of the reconstructed picture is then stored in the frame memory 20.
In the local decoder 33, each block of quantized transform coefficients passes from the quantizer 24 to the inverse quantizing circuit (IQ) 27, where it is inversely quantized. Each resulting block of transform coefficients is fed into the inverse orthogonal transform circuit (IDCT) 28, where it is subject to an inverse orthogonal transform, such as an inverse DCT. Each resulting locally-decoded block from the inverse orthogonal transform circuit 28 is supplied to the adder 29, where it is added to its corresponding reference block from the motion compensator 31. The resulting reconstructed picture block is fed into the frame memory 20, where it is stored as a block of a reconstructed picture stored in the memory. When the picture being coded is an I-picture, the motion compensator 31 supplies no reference block to the adder 29, and the reconstructed picture block is derived solely from the locally-decoded block from the inverse orthogonal transform circuit 28.
By the process just described, a reconstructed picture is derived from each I-picture and each P-picture by decoding compressed digital data that are identical to the compressed digital data supplied via the VLC circuit 25 to the video output buffer 26. The reconstituted picture is written into the frame memory 20. The resulting reconstructed pictures stored in the frame memory 20 are then used in coding P-pictures and B-pictures.
When the current picture is coded using inter-picture coding (i.e., is a P-picture or a B-picture), the reference block for coding each block of the picture is generated by the motion compensator 31 in response to the motion detector 30. The motion detector 30 performs block matching between each block of the current picture and the reference picture derived from the reconstructed pictures stored in the frame memory 20. This detects the motion of each block of the current picture relative to the reference picture. The motion detector 30 generates a motion vector representing this motion, and feeds the motion vector to the VLC circuit 25 and to the motion compensator 31. The VLC circuit 25 applies variable-length coding to the motion vector and combines the result with the variable-length coded transform coefficients received from the quantizer 24. The VLC circuit 25 feeds the resulting digital video data to the video output buffer 26.
In response to the motion vector received from the motion detector 30, the motion compensator 31 carries out motion compensation on the reference picture derived from the reconstructed pictures stored in the frame memory 20, and provides the resulting reference block corresponding to the picture block of the current picture to the subtractor 22 and to the adder 29. As described above, the subtractor 22 subtracts the reference block received from the motion compensator 31 from the picture block of the current picture to derive a block of differences for coding, and the adder 29 adds the reference block received from the motion compensator 31 to the locally-decoded block received from the inverse orthogonal transform circuit 28 to generate a block of the reconstructed picture, which it supplies to the frame memory 20 for storage.
The video output buffer 26 monitors the number of bytes of compressed digital video data accumulated therein and adjusts quantizing step size in the quantizing circuit 24 so that the accumulated number of bytes of compressed digital data does not cause the video output buffer to overflow or to underflow. The compressed digital video data stored in the video output buffer 26 are read out at a constant rate, and are delivered to the error correction circuit 5 as the compressed digital video signal D2.
The decoder 11 of the motion picture reproducing apparatus 8 (FIG. 2) is constructed as shown in FIG. 4. The compressed digital video signal D3 is transferred at a constant transfer rate from the error correction circuit (ECC) 10 to the video input buffer 40, where it is stored. The compressed digital video data for each picture are read out from the video input buffer 40, and are supplied to the inverse VLC circuit 41. The inverse variable length coding circuit (inverse VLC circuit) applies inverse VLC coding to the compressed digital data for each picture, and supplies the resulting blocks of quantized transform coefficients to the inverse quantizing circuit (IQ) 42.
After it has finished applying inverse VLC coding to the compressed digital data for each picture, the inverse VLC circuit 41 feeds a request code RQ to the video input buffer 40 to cause the video input buffer to provide the compressed digital data for the next picture. In response to the request code, the video input buffer 40 transfers the compressed digital video data of the next picture to the inverse VLC circuit 41. The transfer rate of this process is the same as the transfer rate from the VLC circuit 25 to the video output buffer 26 in the encoder 4 (FIG. 3), so the video input buffer 40 will neither overflow nor underflow when it receives compressed video data at a constant transfer rate from the storage medium 7. In fact, in the encoder 4, the video output buffer 26 controls the number of bytes of compressed video data accumulated therein by emulating the video input buffer 40 in the decoder 11 such that the video input buffer will neither overflow nor underflow.
In addition to applying inverse VLC coding to the compressed digital data for each picture, the inverse VLC circuit 41 extracts from the compressed digital data the motion vector MV for each block and quantizing step size data SS. The quantizing step size data is generated by the encoder 4 (FIG. 1) and is included in the recording signal recorded on the recording medium 7 for use in dequantizing the quantized transform coefficients in the dequantizer 42 in the decoder 11. The motion vector MV is generated by the motion detector 30 (FIG. 3), and is included in the recording signal recorded on the recording medium 7 for use in the motion compensator 46 in the decoder 11.
The dequantizer 42 dequantizes each block of quantized transform coefficients supplied by the inverse VLC circuit 41 in accordance with quantizing step size data SS extracted from the compressed digital video data by the inverse VLC circuit 41, and supplies each resulting block of transform coefficients to the inverse orthogonal transform (IDCT) circuit 43.
The inverse orthogonal transform circuit 43 applies an inverse orthogonal transform, such as an inverse discrete cosine transform, to each block of transform coefficients supplied by the dequantizing circuit 42 to provide a decoded block. The decoded block is supplied to the adder 44, which also receives the corresponding reference block of the corresponding reference picture derived by the motion compensator 46 from one or more of the reconstructed pictures stored in the frame memory 45. The resulting reconstructed picture block received from the adder 44 is stored in the frame memory 45 as a block of a new reconstructed picture.
If the current picture is an I-picture, the motion compensator 46 provides no reference block to the adder 44, and the reconstructed block is generated using the decoded block alone. If the current picture is a P-picture, having an I-picture or another P-picture as its reference picture, the I-picture or P-picture is copied from the frame memory 45 to the motion compensator 46 as the reference picture for the current picture. The motion compensator 46 applies motion compensation to the reference picture copied from the frame memory 45 in accordance with the motion vector for the current block of the current picture. The motion compensator 46 then provides the resulting block of the reference picture to the adder 44 as the reference block for the current block of the current picture.
The adder 44 adds the decoded block received from the inverse orthogonal transform circuit 43 to the reference block received from the motion compensator 46 to reconstruct the current block of the current P-picture, which is stored in the frame memory 45. This process is then repeated for the remaining blocks of the current P-picture until all of the blocks of the current picture have been reconstructed.
If the current picture is a B-picture, the one or more I-pictures and/or P-pictures are copied from the frame memory 45 to the motion compensator 46, which generates from these pictures, in response to the motion vector for the current block, the reference block for reconstructing the current block. The motion compensator 46 supplies the reference block to the adder 44.
The adder 44 adds the decoded block received from the inverse orthogonal transform circuit 43 to the reference block received from the motion compensator 46 to reconstruct the current block of the current B-picture, which is stored in the frame memory 45. This process is then repeated for the remaining blocks of the current B-picture until all of the blocks of the current picture have been reconstructed.
The current picture stored in the frame memory 45 as just described is read out in line scan order by the scanning address generating circuit (FOSL) 47 addressing the frame memory 45. The resulting digital video output signal D4 is then fed to the monitor 13 (FIG. 2), either directly, or via the digital-to-analog converter 12. After it has been read out, the current picture, if an I-picture or P-picture, remains briefly stored in the frame memory 45 for use in decoding other P- and B-pictures.
In the manner just described, the recording apparatus and the reproducing apparatus reduce the redundancy within each picture by orthogonally transforming square blocks of the picture, and reduce the redundancy between pictures by means of the motion vector and block matching. These two techniques are combined to compress the digital video signal representing the motion picture so that the motion picture may be recorded, transmitted, or distributed using a relatively small amount of data.
A picture rate conversion method known as 2-3 pull-down conversion is used when an interlaced video signal having a field rate of 60 Hz is derived from a motion picture film source, such as a motion picture film, or a 24-frame video signal, by means of a telecine or other apparatus. This method must be used because the interlaced video signal has a picture rate of 60 Hz, i.e., a field rate of 60 Hz, whereas the motion picture film source has a picture rate of 24 Hz, i.e., a frame rate of 24 Hz. In this method, for example as shown in FIGS. 5A and 5B, two fields of the video signal are derived from the first of each two consecutive frames of the motion picture film source, and three fields of the video signal are derived from the second of the two fields of the motion picture film source.
In FIGS. 5A and 5B, FIG. 5A shows four consecutive frames, including the frames 50 and 51, of a motion picture film source having a frame frequency of 24 Hz. Each frame of the motion picture film source is scanned twice to provide an odd field, indicated by solid lines, and an even field, indicated by broken lines, offset from the odd field by one line.
Accordingly, the first two fields of the interlaced video signal are derived from the zero-th motion picture film source frame 50. The odd field produced by scanning the motion picture film source frame 50 provides the zero-th field 52, and the even field produced by scanning the motion picture film source frame 50 provides the first field 53 of the interlaced video signal.
The next three fields of the interlaced video signal are derived from the first motion picture film source frame 51. The odd field produced by scanning the motion picture film source frame 51 provides the second field 54, and the even field produced by scanning the motion picture film source frame 51 provides the third field 55 of the interlaced video signal. Then, the motion picture film source frame 51 is scanned a second time to provide an odd field as the fourth field 56 of the interlaced video signal. The process is repeated with the third frame 57 and the fourth frame 58 of the motion picture film source, except that repeated field is the even field 59, as shown. Note that the interlaced video signal frame consisting of the fourth and fifth fields, and the interlaced video signal frame consisting of the sixth and seventh fields are each derived from two different frames of the motion picture film source.
Thus, although the frame frequency of the motion picture film source is different from the field frequency of the interlaced video signal, the frequencies are made to match by scanning every other frame a third time to generate an additional field. This is the basic principle of the 2-3 pull-down conversion method. The 2-3 pull-down conversion method generates an interlaced video signal in which certain fields, such as the second field 54 and the fourth field 56, are completely identical to one another.
A 2-3 pull-down conversion technique similar to that just described is used when an interlaced video signal having a field rate of 50 Hz is derived from a motion picture film source having a frame rate of 24 Hz. PAL-system and SECAM-system video signals are examples of interlaced video signals with a field rate of 50 Hz. When an interlaced video signal with a field rate of 50 Hz is generated from a motion picture film source with a frame rate of 24 Hz, three fields of the interlaced video signal are derived from every twelfth frame of the motion picture film source, and two fields of the interlaced video signal are derived from all other frames.
In the following description, it will be understood that references to video signals with a picture rate (i.e., field rate or frame rate) of 60 Hz also refer to video signals having a picture rate of 50 Hz, and that references to 2-3 pull down conversion in which a video signal having a picture rate of 60 Hz is derived from a motion picture film source or a compressed video signal with a frame rate of 24 Hz also refer to 2-3 pull down conversion in which a video signal with a picture rate of 50 Hz is derived from a motion picture film source or a compressed video signal with a frame rate of 24 Hz. It is also to be understood that references to picture rates of 24 Hz, 50 Hz, and 60 Hz also encompass corresponding non-integer picture rates.
Because an interlaced video signal generated by 2-3 pull-down conversion includes duplicate fields, some types of apparatus for compressing a digital video signal representing a motion picture detect the duplicate fields in the interlaced video signal having a field rate of 60 Hz. Such types of apparatus perform field rate conversion by removing one of each pair of duplicate fields, and compress the resulting digital video signal in interlaced frames having a frame rate of 24 Hz. This improves the overall efficiency of the compression process. Moreover, to further increase the efficiency of the compression process, the interlaced frames may be compressed either in field mode or in frame mode.
To expand a digital video signal compressed in the way just described, the decoder expands the compressed digital video signal to provide an interlaced digital video signal with a frame rate of 24 Hz. The decoder then performs 2-3 pull down conversion to obtain an interlaced video signal with a field rate of 60 Hz.
If such a decoder is adapted to expand the compressed digital video signal in the manner described to provide a non-interlaced output signal for display on a non-interlaced monitor, such as on a non-interlaced computer monitor, the output signal will be displayed with a high picture quality, close to that of the original motion picture film source with the frame rate of 24 Hz. However, to adapt the decoder to convert the interlaced pictures obtained by expanding the compressed digital video signal into a non-interlaced video signal requires a field rate conversion circuit or the like, which increases the complexity of the decoder.