1. Field of the Invention
The present invention relates to a technique of playing back moving image data of an MPEG format using inter-frame encoding.
2. Description of the Related Art
Recently, a demand has arisen for a technique of handling moving image information as digital data, and encoding it at high compression ratio with high quality for use in accumulation and transmission. For image information compression, methods such as MPEG for compression-encoding image information by orthogonal transform (e.g., discrete cosine transform), motion prediction, and motion compensation using redundancy unique to moving image information have been proposed and become popular.
Manufacturers have developed and commercialized image capturing apparatuses (e.g., a digital camera and digital video camera), DVD recorders, and the like capable of recording images using these encoding methods. Users can easily view images using these apparatuses, personal computers, DVD players, and the like.
These days, H.264 (MPEG4-Part10 AVC) is available as an encoding method aiming at higher compression ratios and higher image qualities. It is known that H.264 requires larger calculation amounts for encoding and decoding than those in conventional encoding methods such as MPEG-2 and MPEG-4, but can achieve higher encoding efficiencies (see ISO/IEC 14496-10, “Advanced Video Coding”).
FIG. 1 is a block diagram showing the arrangement of an image processing apparatus which compresses image data by H.264. In FIG. 1, input image data is divided into macroblocks, which are sent to a subtracter 1101. The subtracter 1101 calculates the difference between image data and a predicted value, and outputs it to an integer DCT (Discrete Cosine Transform) transform unit 1102. The integer DCT transform unit 1102 executes integer DCT transform for the input data, and outputs the transformed data to a quantization unit 1103. The quantization unit 1103 quantizes the input data. The quantized data is sent as difference image data to an entropy encoder 1115, while it is inversely quantized by an inverse quantization unit 1104, and undergoes inverse integer DCT transform by an inverse integer DCT transform unit 1105. An adder 1106 adds a predicted value to the inversely transformed data, reconstructing an image.
The reconstructed image is sent to a frame memory 1107 for intra (intra-frame) prediction, while it undergoes deblocking filter processing by a deblocking filter 1109, and then is sent to a frame memory 1110 for inter (inter-frame) prediction. The image in the intra prediction frame memory 1107 is used for intra prediction by an intra prediction unit 1108. The intra prediction uses the value of a pixel adjacent to an encoded block as a predicted value.
The image in the inter prediction frame memory 1110 is formed from a plurality of pictures, as will be described later. A plurality of pictures is classified into two lists “List0” and “List1”. A plurality of pictures classified into the two lists is used for inter prediction by an inter prediction unit 1111. After the inter prediction, a memory controller 1113 updates internal images. In the inter prediction by the inter prediction unit 1111, a predicted image is determined using an optimal motion vector based on the result of motion detection between image data of different frames by a motion detection unit 1112.
As a result of intra prediction and inter prediction, a selector 1114 selects an optimal prediction result. The motion vector is sent to the entropy encoder 1115, and encoded together with the difference image data, forming an output bit stream.
H.264 inter prediction will be explained in detail with reference to FIG. 2 to FIG. 5.
The H.264 inter prediction can use a plurality of pictures for prediction. Hence, two lists (“List0” and “List1”) are prepared to specify a reference picture. A maximum of five reference pictures can be assigned to each list.
P-pictures use only “List0” to mainly perform forward prediction. B-pictures use “List0” and “List1” to perform bidirectional prediction (or only forward or backward prediction). That is, “List0” holds pictures mainly for forward prediction, and “List1” holds pictures mainly for backward prediction.
FIG. 2 shows an example of a reference list used in encoding. This example assumes that the ratio of I-, P-, and B-pictures is a standard one, that is, I-pictures are arranged at an interval of 15 frames, P-pictures are arranged at an interval of three frames, and B-pictures between I- and P-pictures are arranged at an interval of two frames. In FIG. 2, image data 1201 is obtained by arranging pictures in the display order. Each square in the image data 1201 describes the type of picture and a number representing the display order. For example, a picture I15 is an I-picture whose display order is 15, and is used for only intra prediction. A picture P18 is a P-picture whose display order is 18, and is used for only forward prediction. A picture B16 is a B-picture whose display order is 16, and is used for bidirectional prediction.
The encoding order is different from the display order, and data are encoded in the prediction order. In FIG. 2, data are encoded in the order of “I15, P18, B16, B17, P21, B19, B20, . . . ”.
In FIG. 2, a reference list (List0) 1202 holds temporarily encoded/decoded pictures. For example, inter prediction using a picture P21 (P-picture whose display order is 21) refers to pictures which have been encoded and decoded in the reference list (List0) 1202. In the example shown in FIG. 2, the reference list 1202 holds pictures P06, P09, P12, I15, and P18.
In inter prediction, a motion vector having an optimal predicted value is obtained for each macroblock from reference pictures in the reference list (List0) 1202, and encoded. Pictures in the reference list (List0) 1202 are sequentially given reference picture numbers, and discriminated (separately from numbers shown in FIG. 2).
After the end of encoding the picture P21, the picture P21 is newly decoded and added to the reference list (List0) 1202. The oldest reference picture (in this case, the picture P06) is deleted from the reference list (List0) 1202. Encoding proceeds in the order of pictures B19, B20, and P24. FIG. 3 shows the state of the reference list (List0) 1202 at this time.
FIG. 4 shows a change of the reference list for each picture.
In FIG. 4, pictures are encoded sequentially from the top. FIG. 4 shows a picture during encoding and the contents of the reference lists (List0 and List1) for it. When a P-picture (or I-picture) is encoded as shown in FIG. 4, the reference lists (List0 and List1) are updated to delete the oldest pictures from the reference lists (List0 and List1). In this example, the reference list (List1) holds only one picture. This is because, if the number of pictures referred to for backward prediction increases, the buffer amount till decoding also increases. In other words, backward pictures excessively distant from a picture during encoding are not referred to.
In this example, I- and P-pictures are referred to, and all I- and P-pictures are sequentially added to the reference lists (List0 and List1). Only I-pictures are used in the reference list (List1) for backward prediction because this picture arrangement is considered to be the most popular one. However, the picture arrangement in the reference list is merely an example of the most popular one, and H.264 itself has a high degree of freedom for the configuration of the reference list.
For example, not all I- and P-pictures need be added to the reference list, and B-pictures can also be added to the reference list. Also, H.264 defines a long-term reference list of pictures which stay in the reference list until an explicit instruction is received. FIG. 5 shows a change of the reference list when adding B-pictures to the reference list. When adding B-pictures to the reference list, encoded pictures may be added to the reference list every time all B-pictures are encoded.
A file format for recording moving image data compressed in this way will be explained.
As described above, the MP4 (MPEG-4) film format is used as a general-purpose format for recording MPEG (MPEG-2 or MPEG-4 format) image data obtained by a digital video camera, digital still camera, or the like. The MP4 file format ensures compatibility with other digital devices to, for example, play back image data recorded as an MP4 file.
As represented by a of FIG. 6, an MP4 file is basically formed from an mdat box which holds encoded stream image data, and a moov box which holds stream image data-related information. The mdat box is formed from a plurality of chunks (chunk cN), as represented by b of FIG. 6. Each chunk is formed from a plurality of samples (sample sM), as represented by d of FIG. 6. For example, the respective samples sample s1, sample s2, sample s3, sample s4, . . . correspond to encoded MPEG image data I0, B−2, B−1, P3, . . . , as represented by e of FIG. 6.
I0, I1, I2, . . . , In represent intra-encoded (intra-frame-encoded) frame image data. B0, B1, B2, . . . , Bn represent frame image data encoded (inter-frame-encoded) by referring to reference image data bidirectionally. P0, P1, P2, . . . , Pn represent frame image data encoded (inter-frame-encoded) by referring to reference image data unidirectionally (forward direction). These frame image data are variable-length encoded data.
As represented by c of FIG. 6, the moov box is formed from an mvhd box which holds header information recording the creation date and time, and the like, and a trak box which holds information on stream image data stored in the mdat box. Information stored in the trak box includes an stco box which stores information of an offset value for each chunk of the mdat box, as represented by h of FIG. 6, an stsc box which stores information of the number of samples in each chunk, as represented by g of FIG. 6, and an stsz box which stores information of the size of each sample, as represented by f of FIG. 6.
The amounts of data stored in the stco box, stsc box, and stsz box increase together with the recorded image data amount, that is, the recording time. For example, when an image of 30 frames per sec is recorded as an MP4 file by storing every 15 frames in one chunk, the data amount increases to 1 Mbyte for 2 h, requiring a moov box having a capacity of 1 Mbyte.
When playing back this MP4 file, the moov box of the MP4 file is read out from the recording medium, the stco, stsc and stsz boxes are analyzed from the moov box, and then each chunk in the mdat box can be accessed.
When recording an image in the MP4 file format, the stream data increases over time. Since the size of stream data is very large, the stream data needs to be written in the file even during recording. However, the size of the moov box also increases in accordance with the recording time, as described above. The size of the MP4 header is not defined till the end of recording, so the write offset position of stream data in the file cannot be determined. For this reason, recording by a general moving image processing apparatus adopts the following measures using the flexibility of the MP4 file format.
(1) The mdat box is arranged at the start of a file, and after recoding ends, the moov box is arranged next to the mdat box (FIG. 7A).
(2) As proposed in Japanese Patent Laid-Open No. 2003-289495, the size of the moov box is determined in advance to determine the offset position of the mdat box, and then recoding is done (FIG. 7B). Even when the recording time is short and the header area does not become full, the area remains as a free box. When recording data over the header size, the data is recorded by properly decimating frame number information of I-pictures, maintaining the header size at a predetermined size.
(3) A pair of moov and mdat boxes is divided into a plurality of pairs to arrange them (FIG. 7C). The second and subsequent header areas are called moof boxes.
These are the structures of general MP4 files.
A general playback method for the MP4 file will be described below.
FIG. 8 is a block diagram showing an example of the basic arrangement of a moving image playback apparatus which plays back a moving image compression-encoded by H.264.
In FIG. 8, the moving image playback apparatus includes a recording medium 801, a playback circuit 802 which plays back data from a recording medium, a buffer circuit 803, a variable-length decoding circuit 804, an inverse quantization circuit 805, an inverse DCT circuit 806, an addition circuit 807, a memory 808, a motion compensation circuit 809, a switching circuit 810, a rearrangement circuit 811, an output terminal 812, a header information analysis circuit 813, a playback control circuit 814, and a control signal input terminal 815.
The sequence of playback processing in the moving image playback apparatus in FIG. 8 will be explained.
Upon receiving an instruction from the playback control circuit 814, the playback circuit 802 plays back an MP4 file recorded on the recording medium 801, and starts supplying it to the buffer circuit 803. At the same time, the playback control circuit 814 controls the header information analysis circuit 813 to analyze an offset, chunk information, and sample information in the stco box, stsc box, and stsz box representing storage statuses in mdat in the moov box. The playback control circuit 814 controls the playback circuit 802 to start playing back stream image data in the mdat box from the recording medium 801.
The playback circuit 802 plays back, from the start address, the stream image data in the mdat box of the file recorded on the recording medium 801, and supplies it to the buffer circuit 803. Read of the stream image data stored in the buffer circuit 803 starts in accordance with the occupancy of the buffer circuit 803 and the like, supplying the stream image data to the variable-length decoding circuit 804. The variable-length decoding circuit 804 executes variable-length decoding for the played-back stream image data supplied from the buffer circuit 803, and supplies the decoded stream image data to the inverse quantization circuit 805.
The inverse quantization circuit 805 inversely quantizes the stream image data which has undergone variable-length decoding and has been supplied from the variable-length decoding circuit 804. The inverse quantization circuit 805 supplies the inversely quantized stream image data to the inverse DCT circuit 806. The inverse DCT circuit 806 executes inverse DCT for the inversely quantized data supplied from the inverse quantization circuit 805, and supplies the inverse DCT data to the addition circuit 807. The addition circuit 807 adds the inverse DCT data supplied from the inverse DCT circuit 806, and data supplied from the switching circuit 810.
Of stream image data played back from the recording medium 801, intra-frame-encoded data I0 of GOP0 (Group Of Picture) is played back first, as shown in FIG. 9. The playback control circuit 814 controls to select the terminal a of the switching circuit 810, and the switching circuit 810 supplies data “0” to the addition circuit 807. The addition circuit 807 adds data “0” supplied from the switching circuit 810, and inverse DCT data supplied from the inverse DCT circuit 806, and supplies the added data as a played-back frame F0 to the memory 808 and rearrangement circuit 811. The memory 808 stores the added data supplied from the addition circuit 807.
Bidirectionally predictive-encoded picture data B−2 and B−1 are played back next to the intra-frame-encoded data I0 of GOP0. The playback sequence up to the inverse DCT circuit 806 is the same as that described for the intra-frame-encoded data I0, and a description thereof will not be repeated.
The inverse DCT circuit 806 supplies bidirectionally predictive-encoded inverse DCT image data to the addition circuit 807. At this time, the playback control circuit 814 controls the switching circuit 810 so that the movable terminal c of the switching circuit 810 selects the fixed terminal b. Data from the motion compensation circuit 809 is supplied to the addition circuit 807.
The motion compensation circuit 809 detects a motion vector which has been generated in encoding from played-back stream image data and recorded in the stream image data. The motion compensation circuit 809 reads out data of a reference block (in this case, only data from the played-back intra-frame-encoded data F0 because recording has just started) from the memory 808, and supplies it to the movable terminal c of the switching circuit 810.
The addition circuit 807 adds inverse DCT data supplied from the inverse DCT circuit 806, and motion-compensated data supplied from the switching circuit 810, supplying the added data as played-back frames F−2 and F−1 to the rearrangement circuit 811.
Then, unidirectionally predictive-encoded picture data P3 is played back. The playback sequence up to the inverse DCT circuit 806 is the same as that described for the intra-frame-encoded data I0, and a description thereof will not be repeated.
The inverse DCT circuit 806 supplies inverse DCT picture data to the addition circuit 807. At this time, the playback control circuit 814 controls the switching circuit 810 so that the movable terminal c of the switching circuit 810 selects the fixed terminal b. Data from the motion compensation circuit 809 is supplied to the addition circuit 807.
The motion compensation circuit 809 detects a motion vector which has been generated in encoding from played-back stream image data and recorded in the stream image data. The motion compensation circuit 809 reads out data of a reference block (in this case, data from the played-back intra-frame-encoded data F0) from the memory 808, and supplies it to the movable terminal c of the switching circuit 810.
The addition circuit 807 adds inverse DCT data supplied from the inverse DCT circuit 806, and motion-compensated data supplied from the switching circuit 810, supplying the added data as a played-back frame F3 to the memory 808 and the rearrangement circuit 811. The memory 808 stores the added data supplied from the addition circuit 807.
Then, pictures B1 and B2 are played back. These pictures are not frames at the start of recoding, and thus are played back by the same sequence as that described for the above-mentioned pictures B−2 and B−1 except that they are played back from the frames F0 and F3 by bidirectional prediction. In the above-described way, P6, B4, B5, . . . are sequentially played back.
The rearrangement circuit 811 rearranges the sequentially played-back frames F0, F−2, F−1, F3, F1, F2, F6, F4, F5, . . . into F−2, F−1, F0, F1, F2, F3, F4, F5, F6, . . . , and outputs the rearranged frames to the output terminal 812.
At the start of playing back the file, the header information analysis circuit 813 analyzes an offset, chunk information, and sample information from the stco box, stsc box, and stsz box representing storage statuses in mdat in the moov box of the MP4 file. The playback control circuit 814 operates to skip data till GOP1, and start playing back data from GOP1.
In a lens-interchangeable digital camera, when the lens is detached from the camera body, mote floating in air may enter the camera body. The camera incorporates various mechanical units such as a shutter mechanism which mechanically operate. When these mechanical units operate, dust such as metal powder may be generated in the camera body.
When a foreign substance such as dust or mote adheres to the surface of an image sensor which forms the image capturing unit of a digital camera, the shadow of the foreign substance is captured in a sensed image, degrading the quality of the sensed image.
To solve this problem, there is proposed a method of correcting a pixel capturing the shadow of a foreign substance by using the signals of neighboring pixels or the like.
As a technique of correcting the shadow of a foreign substance, for example, Japanese Patent Laid-Open No. 2003-289495 proposes an image defect correction method of correcting the pixel defect of an image sensor.
Japanese Patent Laid-Open No. 6-105241 proposes a method for simplifying setting of position information of a pixel defect. More specifically, the extension of an image file recorded in the dust obtaining mode is changed from that of a normal image, and the PC automatically discriminates a dust information image. By using this information, a target image is corrected.
Some products record the dust information as photographing information in a recorded image file, and correct a target image using the information.
Japanese Patent Laid-Open No. 2004-242158 discloses a related technique.
However, when a moving image file like the above-described MP4 file is played back while correcting a target image on the basis of the dust information, the amount of used memory increases, and the quality of moving image playback degrades owing to a decrease in operating speed.
In still image playback, a dust-corrected still image is played back, so it suffices to execute dust correction once per image. Even if dust correction processing takes a long time under the limitation of the memory or the like, playback of a still image can wait until the completion of dust correction processing.
However, in moving image playback, the motion of an image is expressed by continuously playing back a plurality of still images such as 15 or 30 frames per sec. In addition to general playback processing, dust correction processing needs to be executed 15 times for 15 frames per sec or 30 times for 30 frames. No natural moving image playback can be achieved unless the processing ends within the limited time. A moving image may be played back without performing dust correction when no natural moving image playback can be done. As a result, a poor image in which no dust is corrected may be displayed in still image display upon pause or frame advance in which the user views an image carefully for a long time during moving image playback.