1. Field of the Invention
The present invention relates to a technique of suppressing deterioration of image quality caused by a foreign substance adhering to the surface of an optical element such as an optical low-pass filter arranged in front of an image sensor such as a CCD or CMOS sensor in an image capturing apparatus.
2. Description of the Related Art
When the lens is detached from the camera body of a lens-interchangeable digital camera, mote floating in air may enter the camera body. The camera incorporates various mechanical units such as a shutter mechanism which mechanically operate. When these mechanical units operate, dust such as metal powder may be generated in the camera body.
When a foreign substance such as dust or mote adheres to the surface of an optical low-pass filter which is an optical element arranged in front of an image sensor and forms the image capturing unit of the digital camera, the shadow of the foreign substance is contained in a captured image, deteriorating the image quality.
A camera using a silver-halide film feeds the film in every shooting. Hence, images never contain the same foreign substance at the same position continuously. However, the digital camera requires no operation of feeding the film frame in every shooting, and therefore, captured images continuously contain the same foreign substance at the same position.
To solve this problem, there is proposed a method of correcting a pixel capturing the shadow of a foreign substance by using the signals of neighboring pixels or the like. As a technique of correcting such a pixel, for example, Japanese Patent Laid-Open No. 6-105241 proposes a pixel defect correction method of correcting the pixel defect of an image sensor. Japanese Patent Laid-Open No. 2004-242158 proposes a method which simplifies setting of position information of a pixel defect. According to this method, the extension of an image file recorded in the dust obtaining mode is changed from that of a normal image. The PC (Personal Computer) automatically discriminates a dust information image, and corrects a target image based on this information.
Recently, there is proposed a technique of handling moving image information as digital data and encoding it at high compression ratio with high image quality to accumulate and transmit it. This technique is becoming popular.
Motion JPEG (Joint Photographic Experts Group) encodes a moving image by applying still image encoding (e.g., JPEG encoding) to each frame. Although JPEG encoding basically targets still images, products which apply JPEG encoding to even moving images by high-speed processing have come into practical use.
An outline of JPEG encoding will be explained briefly. Image data is divided into blocks of a predetermined size (e.g., blocks each having 8×8 pixels), and each block undergoes 2D discrete cosine transform. The transform coefficient is quantized linearly or non-linearly. The quantized transform coefficient undergoes Huffman coding (variable length coding). More specifically, the difference value between the DC component of the transform coefficient and that of a neighboring block is Huffman-coded. The AC component of the transform coefficient is converted from a low-frequency component to a high-frequency serial component by zig-zag scanning. A set of an invalid component “0” run and a subsequent valid component run is Huffman-coded.
To the contrary, H.264 (MPEG4-Part10 AVC) is proposed as an encoding method aiming at higher compression ratios and higher image qualities. It is known that H.264 requires larger calculation amounts for encoding and decoding than those in conventional encoding methods such as MPEG2 and MPEG4, but can achieve higher coding efficiencies (see ISO/IEC 14496-10, “Advanced Video Coding”).
FIG. 14 is a diagram showing the arrangement of an image processing apparatus which compresses image data by H.264. In FIG. 14, input image data is divided into macroblocks, which are sent to a subtracter 401. The subtracter 401 calculates the difference between image data and a predicted value, and outputs it to an integer DCT (Discrete Cosine Transform) transform unit 402. The integer DCT transform unit 402 executes integer DCT transform for the input data, and outputs the transformed data to a quantization unit 403. The quantization unit 403 quantizes the input data. The quantized data is sent as difference image data to an entropy encoder 415, while it is inversely quantized by an inverse quantization unit 404, and undergoes inverse integer DCT transform by an inverse integer DCT transform unit 405. An adder 406 adds a predicted value to the inversely transformed data, reconstructing an image.
The reconstructed image is sent to a frame memory 407 for intra (intra-frame) prediction, while it undergoes deblocking filter processing by a deblocking filter 409, and then is sent to a frame memory 410 for inter (inter-frame) prediction. The image in the intra prediction frame memory 407 is used for intra prediction by an intra prediction unit 408. The intra prediction uses the value of a pixel adjacent to an encoded block as a predicted value within a single picture.
The image in the inter prediction frame memory 410 is formed from a plurality of pictures, as will be described later. A plurality of pictures are classified into two lists “List0” and “List1”. An inter prediction unit 411 uses a plurality of pictures classified into the two lists for inter prediction. After the inter prediction, a memory controller 413 updates internal images. The inter prediction unit 411 performs inter prediction to determine a predicted image using an optimal motion vector based on the result of motion detection between image data of different frames by a motion detection unit 412.
As a result of the intra prediction and inter prediction, a selector 414 selects an optimal prediction result. The motion vector is sent to the entropy encoder 415, where it is encoded together with the difference image data, forming an output bit stream.
H.264 inter prediction will be explained in detail with reference to FIGS. 15 to 18.
The H.264 inter prediction can use a plurality of pictures for prediction. To specify a reference picture, two lists “List0” and “List1” are prepared. Each list can hold a maximum of five reference pictures.
Only “List0” is used for P-pictures to mainly perform forward prediction. “List0” and “List1” are used for B-pictures to perform bidirectional prediction (or only forward or backward prediction). That is, “List0” holds pictures mainly for forward prediction, and “List1” holds pictures mainly for backward prediction.
FIG. 15 exemplifies a reference list used in encoding. This example assumes that the ratio of I-, P-, and B-pictures is a standard one. That is, I-pictures are arranged at an interval of 15 frames, P-pictures are arranged at an interval of three frames, and B-pictures between I- and P-pictures are arranged at an interval of two frames. In FIG. 15, image data 1001 is obtained by arranging pictures in the display order. Each square in the image data 1001 describes the type of picture and a number representing the display order. For example, a picture I15 is an I-picture whose display order is 15, and is used for only intra prediction. A picture P18 is a P-picture whose display order is 18, and is used for only forward prediction. A picture B16 is a B-picture whose display order is 16, and is used for bidirectional prediction.
The encoding order is different from the display one, and data are encoded in the prediction order. In FIG. 15, data are encoded in the order of “I15, P18, B16, B17, P21, B19, B20, . . . .”
In FIG. 15, a reference list (List0) 1002 holds temporarily encoded/decoded pictures. For example, inter prediction using a picture P21 (P-picture whose display order is 21) refers to pictures which have been encoded and decoded in the reference list (List0) 1002. In the example shown in FIG. 15, the reference list 1002 contains pictures P06, P09, P12, I15, and P18.
In inter prediction, a motion vector having an optimal predicted value is obtained for each macroblock from reference pictures in the reference list (List0) 1002, and encoded. Pictures in the reference list (List0) 1002 are sequentially given reference picture numbers, and discriminated (separately from numbers shown in FIG. 15).
After the end of encoding the picture P21, it is newly decoded and added to the reference list (List0) 1002. The oldest reference picture (in this case, the picture P06) is deleted from the reference list (List0) 1002. Encoding proceeds in the order of pictures B19, B20, and P24. FIG. 16 shows the state of the reference list (List0) 1002 at this time.
FIG. 17 shows a change of the reference list for each picture.
In FIG. 17, pictures are encoded sequentially from the top. FIG. 17 shows a picture during encoding and the contents of the reference lists List0 and List1 for it. When a P-picture (or I-picture) is encoded as shown in FIG. 17, the reference lists List0 and List1 are updated to delete the oldest pictures from the reference lists List0 and List1. In this example, the reference list List1 holds only one picture. This is because a larger number of pictures referred to for backward prediction increases the buffer amount till decoding. In other words, backward pictures excessively distant from a picture during encoding are not referred to.
In this example, I- and P-pictures are referred to, and all I- and P-pictures are sequentially added to the reference lists List0 and List1. Only P-pictures are used in the reference list List1 for backward prediction because this picture arrangement is considered to be the most popular one. However, the picture arrangement in the reference list is merely an example of the most popular one. H.264 itself has a high degree of freedom for the configuration of the reference list.
For example, not all I- and P-pictures need be added to the reference list, and B-pictures can be added to the reference list. Also, H.264 defines a long-term reference list of pictures which stay in the reference list until an explicit instruction is received. FIG. 18 shows a change of the reference list when adding B-pictures to the reference list. When adding B-pictures to the reference list, encoded pictures may be added to the reference list every time all B-pictures are encoded.
FIG. 19 shows partitioning of a macroblock of 16×16 pixels into smaller macroblocks in H.264 inter prediction. The partitioned macroblocks can refer to independent reference pictures to obtain a motion vector. A macroblock of 8×8 pixels can be partitioned into smaller sub-macroblocks. Although the sub-macroblocks refer to a single reference picture, their motion vectors are obtained independently. Note that Japanese Patent Laid-Open No. 2005-5844 also discloses an arrangement capable of changing the motion compensation block size, as shown in FIG. 27 of this reference.
A general playback method for a file recorded by H.264 will be explained.
As described above, the MP4 (MPEG-4) film format is used as a general-purpose format for recording MPEG (MPEG-2 or MPEG-4 format) image data obtained by a digital video camera, digital still camera, or the like. The MP4 file format ensures compatibility with other digital devices to, for example, play back image data recorded as an MP4 file.
As represented by a of FIG. 20, an MP4 file is basically formed from an mdat box which holds encoded stream image data, and a moov box which holds stream image data-related information.
The mdat box is formed from a plurality of chunks (chunk cN), as represented by b of FIG. 20. Each chunk is formed from a plurality of samples (sample sN), as represented by d of FIG. 20. For example, the respective samples sample s1, sample s2, sample s3, sample s4, . . . correspond to encoded MPEG image data I0, B−2, B−1, P3, . . . , as represented by e of FIG. 20.
I0, I1, I2, . . . , In represent intra-encoded (intra-frame-encoded) frame image data. B0, B1, B2, . . . , Bn represent frame image data encoded (inter-frame-encoded) by referring to reference image data bidirectionally. P0, P1, P2, . . . , Pn represent frame image data encoded (inter-frame-encoded) by referring to reference image data unidirectionally (forward direction). These frame image data are variable-length encoded data.
As represented by c of FIG. 20, the moov box is formed from an mvhd box which holds header information recording the creation date and time, and the like, and a trak box which holds information on stream image data stored in the mdat box.
Information stored in the trak box includes an stco box which stores information of an offset value for each chunk of the mdat box, as represented by h of FIG. 20. An stsc box which stores information of the number of samples in each chunk, as represented by g of FIG. 20, and an stsz box which stores information of the size of each sample, as represented by f of FIG. 20.
The amounts of data stored in the stco, stsc, and stsz boxes increase together with the recorded image data amount, i.e., the recording time. For example, when an image of 30 frames per sec is recorded as an MP4 file by storing every 15 frames in one chunk, the data amount reaches 1 Mbyte for 2 hours, requiring a moov box of a 1-Mbyte capacity.
When playing back this MP4 file, the moov box of the MP4 file is read from the recording medium, the stco, stsc and stsz boxes in the moov box are analyzed. After that, each chunk in the mdat box can be accessed.
When recording an image in the MP4 file format, the stream data increases over time to a very large size. Hence, the stream data needs to be written in the file even during recording.
However, the size of the moov box also increases in accordance with the recording time, as described above. The size of the MP4 header is not defined till the end of recording, so the write offset position of stream data in the file cannot be decided.
For this reason, recording by a general moving image processing apparatus adopts the following measures using the flexibility of the MP4 file format.
(1) The mdat box is arranged at the start of a file, and after the end of recoding, the moov box is arranged next to the mdat box (a of FIG. 21).
(2) As proposed in Japanese Patent Laid-Open No. 2003-289495, the size of the moov box is determined in advance to decide the offset position of the mdat box, and then recoding is done (b of FIG. 21). Even when the recording time is short and the header area does not become full, the area remains as a free box. When recording data over the header size, the data is recorded by properly decimating frame number information of I-pictures, maintaining a predetermined header size.
(3) A pair of moov and mdat boxes is divided into a plurality of pairs to arrange them (c of FIG. 21). In this case, the second and subsequent header areas are called moof boxes.
These are the structures of general MP4 files.
FIG. 22 is a block diagram exemplifying the basic arrangement of a moving image playback apparatus which plays back a moving image compression-encoded by H.264.
Referring to FIG. 22, the moving image playback apparatus includes a recording medium 801, a playback circuit 802 which plays back data from a recording medium, a buffer circuit 803, a variable-length decoding circuit 804, an inverse quantization circuit 805, an inverse DCT circuit 806, an addition circuit 807, a memory 808, a motion compensation circuit 809, a switching circuit 810, a rearrangement circuit 811, an output terminal 812, a header information analysis circuit 813, a playback control circuit 814, and a control signal input terminal 815.
The sequence of playback processing in the moving image playback apparatus shown in FIG. 22 will be explained.
Upon receiving an instruction from the playback control circuit 814, the playback circuit 802 plays back an MP4 file recorded on the recording medium 801, and starts supplying it to the buffer circuit 803. At the same time, the playback control circuit 814 controls the header information analysis circuit to analyze an offset, chunk information, and sample information in the stco box, stsc box, and stsz box representing storage statuses in mdat in the moov box. The playback control circuit 814 controls the playback circuit 802 to start playing back stream image data in the mdat box from the recording medium 801.
The playback circuit 802 plays back, from the start address, the stream image data in the mdat box of the file recorded on the recording medium 801, and supplies it to the buffer circuit 803. Readout of the stream image data stored in the buffer circuit 803 starts in accordance with the occupancy of the buffer circuit 803 and the like. The stream image data is supplied to the variable-length decoding circuit 804. The variable-length decoding circuit 804 executes variable-length decoding for the played-back stream image data supplied from the buffer circuit 803, and supplies the decoded stream image data to the inverse quantization circuit 805.
The inverse quantization circuit 805 inversely quantizes the stream image data supplied from the variable-length decoding circuit 804 upon variable-length decoding. The inverse quantization circuit 805 supplies the inversely quantized stream image data to the inverse DCT circuit 806. The inverse DCT circuit 806 executes inverse DCT for the inversely quantized data supplied from the inverse quantization circuit 805, and supplies the inverse DCT data to the addition circuit 807. The addition circuit 807 adds the inverse DCT data supplied from the inverse DCT circuit 806, and data supplied from the switching circuit 810.
Of stream image data played back from the recording medium 801, intra-frame-encoded data I0 of GOP0 (Group Of Picture) is played back first, as shown in FIG. 23. The playback control circuit 814 controls to select the terminal a of the switching circuit 810. The switching circuit 810 supplies data “0” to the addition circuit 807. The addition circuit 807 adds data “0” supplied from the switching circuit 810, and inverse DCT data supplied from the inverse DCT circuit 806, and supplies the added data as a played-back frame F0 to the memory 808 and rearrangement circuit 811. The memory 808 stores the added data supplied from the addition circuit 807.
Bidirectionally predictive-encoded picture data B−2 and B−1 are played back next to the intra-frame-encoded data I0 of GOP0. The playback sequence up to the inverse DCT circuit 806 is the same as that described for the intra-frame-encoded data I0, and a description thereof will not be repeated.
The inverse DCT circuit 806 supplies bidirectionally predictive-encoded inverse DCT image data to the addition circuit 807. At this time, the playback control circuit 814 controls the switching circuit 810 so that the movable terminal c of the switching circuit 810 selects the fixed terminal b. The switching circuit 810 supplies data from the motion compensation circuit 809 to the addition circuit 807.
The motion compensation circuit 809 detects a motion vector which has been generated in encoding from played-back stream image data and recorded in the stream image data. The motion compensation circuit 809 reads out data of a reference block (in this case, only data from the played-back intra-frame-encoded data F0 because recording has just started) from the memory 808, and supplies it to the movable terminal c of the switching circuit 810.
The addition circuit 807 adds inverse DCT data supplied from the inverse DCT circuit 806, and motion-compensated data supplied from the switching circuit 810, supplying the added data as played-back frames F−2 and F−1 to the rearrangement circuit 811.
Then, unidirectionally predictive-encoded picture data P3 is played back. The playback sequence up to the inverse DCT circuit 806 is the same as that described for the intra-frame-encoded data I0, and a description thereof will not be repeated.
The inverse DCT circuit 806 supplies inverse DCT picture data to the addition circuit 807. At this time, the playback control circuit 814 controls the switching circuit 810 so that the movable terminal c of the switching circuit 810 selects the fixed terminal b. The switching circuit 810 supplies data from the motion compensation circuit 809 to the addition circuit 807.
The motion compensation circuit 809 detects a motion vector which has been generated in encoding from played-back stream image data and recorded in the stream image data. The motion compensation circuit 809 reads out data of a reference block (in this case, data from the played-back intra-frame-encoded data F0) from the memory 808, and supplies it to the movable terminal c of the switching circuit 810.
The addition circuit 807 adds inverse DCT data supplied from the inverse DCT circuit 806, and motion-compensated data supplied from the switching circuit 810, supplying the added data as a played-back frame F3 to the memory 808 and rearrangement circuit 811. The memory 808 stores the added data supplied from the addition circuit 807.
Then, pictures B1 and B2 are played back. These pictures are not frames at the start of recoding, and thus are played back by the same sequence as that described for the pictures B−2 and B−1 except that they are played back from the frames F0 and F3 by bidirectional prediction. In the above-described way, P6, B4, B5, . . . are played back sequentially.
The rearrangement circuit 811 rearranges the sequentially played-back frames F0, F−2, F−1, F3, F1, F2, F6, F4, F5, . . . into F−2, F−1, F0, F1, F2, F3, F4, F5, F6, . . . , and outputs the rearranged frames to the output terminal 812.
At the start of playing back the file, the header information analysis circuit 813 analyzes an offset, chunk information, and sample information from the stco box, stsc box, and stsz box representing storage statuses in mdat in the moov box of the MP4 file. The playback control circuit 814 operates to skip data till GOP1, and start playing back data from GOP1.
Compact digital cameras capable of recording and playing back a moving image based on these encoding methods have also been developed and commercialized. Users can easily view images with such a digital camera, a personal computer, a DVD player, and the like.
In this situation, a need has recently arisen for recording higher-resolution moving images with a larger number of pixels by lens-interchangeable digital cameras as well as compact digital cameras. However, the lens-interchangeable digital camera suffers dust adhered to the surface of an optical element arranged in front of an image sensor owing to a variety of factors, as described above. If the lens-interchangeable digital camera records a moving image while dust adheres to the surface, the shadow of dust may always appear at the same position in a played-back moving image.
According to a conventional dust removal method for the lens-interchangeable digital camera, information (e.g., information on the position and size of dust) necessary for dust removal and image data are recorded. The image is loaded later into a personal computer or the like to remove the shadow of dust by image processing. That is, the recorded image data contains the shadow of dust. As for a still image, dust removal is executed for each still image. As for a moving image, dust removal must be done for all the recording time, which is time-consuming.
In playback, dust correction processing is done using information necessary for dust removal. Even if dust removal processing takes a long time, playback of a still image can wait till the completion of the dust correction processing. In contrast, a moving image expresses the motion of an image by successively playing back a plurality of still images of 15 or 30 frames per sec. Thus, no natural moving image can be played back unless frames can be successively decoded within a predetermined time.