1. Field of the Invention
The present invention relates to a technique of suppressing deterioration of image quality caused by a foreign substance adhering to the surface of an optical low-pass filter or the like arranged in front of an image sensor in an image capturing apparatus using the image sensor such as a CCD sensor or CMOS sensor and, more particularly, to a technique of suppressing deterioration of image quality caused by a foreign substance in moving image shooting.
2. Description of the Related Art
Recently, demand has arisen for a technique of handling moving image information as digital data and encoding it at high compression rate with high quality for use in accumulation and transmission. For image information compression, methods such as MPEG have been proposed and become popular. MPEG compression-encodes image information by orthogonal transform (e.g., discrete cosine transform), motion prediction, and motion compensation using redundancy unique to moving image information.
Manufacturers have developed and commercialized image capturing apparatuses (e.g., a digital camera and digital video camera), DVD recorders, and the like capable of recording images using these encoding methods. Users can easily view images using these apparatuses, personal computers, DVD players, and the like.
These days, H.264 (MPEG4-Part10 AVC) is available as an encoding method aiming at higher compression rates and higher image qualities. It is known that H.264 requires larger calculation amounts for encoding and decoding than those in conventional encoding methods such as MPEG2 and MPEG4, but can achieve higher encoding efficiencies (see ISO/IEC 14496-10, “Advanced Video Coding”).
FIG. 1 is a block diagram showing the arrangement of an image processing apparatus which compresses image data by H.264.
Referring to FIG. 1, input image data is divided into macroblocks, which are sent to a subtracter 101. FIG. 2 is a schematic view showing input image data divided into macroblocks. FIG. 3 shows general macroblock partitions. According to H.264, the block size can be selected from 16×16 pixels, 16×8 pixels, 8×16 pixels, and 8×8 pixels. For 8×8 pixels, one of 8×8 pixels, 8×4 pixels, 4×8 pixels, and 4×4 pixels can be selected.
The subtracter 101 calculates the difference between image data and a predicted value, and outputs it to an integer DCT (Discrete Cosine Transform) transform unit 102. The integer DCT transform unit 102 executes integer DCT transform for the input data, and outputs the transformed data to a quantization unit 103. The quantization unit 103 quantizes the input data. The quantized data is sent as difference image data to an entropy encoder 115, while it is inversely quantized by an inverse quantization unit 104 and undergoes inverse integer DCT transform by an inverse integer DCT transform unit 105. An adder 106 adds a predicted value to the inversely DCT-transformed data, reconstructing an image.
The reconstructed image is sent to a frame memory 107 for intra (intra-frame) prediction, while it undergoes deblocking filter processing by a deblocking filter 109 and then is sent to a frame memory 110 for inter (inter-frame) prediction. The image in the intra prediction frame memory 107 is used for intra prediction by an intra prediction unit 108. The intra prediction uses the value of a pixel adjacent to an encoded block as a predicted value.
The image in the inter prediction frame memory 110 is formed from a plurality of pictures, as will be described later. A plurality of pictures are classified into two lists “List0” and “List1”. A plurality of pictures classified into the two lists are used for inter prediction by an inter prediction unit 111. After the inter prediction, a memory controller 113 updates internal images. In the inter prediction by the inter prediction unit 111, a predicted image is determined using an optimal motion vector based on the result of motion detection between image data of different frames by a motion detection unit 112.
As a result of intra prediction and inter prediction, a selector 114 selects an optimal prediction result. The motion vector is sent to the entropy encoder 115, and encoded together with the difference image data, forming an output bit stream.
H.264 inter prediction will be explained in detail with reference to FIGS. 4 to 7.
The H.264 inter prediction can use a plurality of pictures for prediction. For this purpose, two lists (“List0” and “List1”) are prepared to specify a reference picture. A maximum of five reference pictures can be assigned to each list.
P-pictures use only “List0” to mainly perform forward prediction. B-pictures use “List0” and “List1” to perform bidirectional prediction (or only forward or backward prediction). That is, “List0” holds pictures mainly for forward prediction, and “List1” holds pictures mainly for backward prediction.
FIG. 4 shows an example of a reference list used in encoding. This example assumes that the ratio of I-, P-, and B-pictures is a standard one, that is, I-pictures are arranged at an interval of 15 frames, P-pictures are arranged at an interval of three frames, and B-pictures between I- and P-pictures are arranged at an interval of two frames. In FIG. 4, image data 401 is obtained by arranging pictures in the display order. Each square in the image data 401 describes the type of picture and a number representing the display order. For example, a picture I15 is an I-picture whose display order is 15, and is used for only intra prediction. A picture P18 is a P-picture whose display order is 18, and is used for only forward prediction. A picture B16 is a B-picture whose display order is 16, and is used for bidirectional prediction.
The encoding order is different from the display order, and data are encoded in the prediction order. In FIG. 4, data are encoded in the order of “I15, P18, B16, B17, P21, B19, B20, . . . .”
In FIG. 4, a reference list (List0) 402 holds temporarily encoded/decoded pictures. For example, inter prediction using a picture P21 (P-picture whose display order is 21) refers to pictures which have been encoded and decoded in the reference list (List0) 402. In the example shown in FIG. 4, the reference list 402 holds pictures P06, P09, P12, I15, and P18.
In inter prediction, a motion vector having an optimal predicted value is obtained for each macroblock from reference pictures in the reference list (List0) 402, and encoded. Pictures in the reference list (List0) 402 are discriminated by sequentially giving them reference picture numbers (different from numbers shown in FIG. 4).
After the end of encoding the picture P21, the picture P21 is newly decoded and added to the reference list (List0) 402. The oldest reference picture (in this case, the picture P06) is deleted from the reference list (List0) 402. Encoding proceeds in the order of pictures B19, B20, and P24. FIG. 5 shows the state of the reference list (List0) 402 at this time.
FIG. 6 shows a change of the reference list for each picture.
In FIG. 6, pictures are encoded sequentially from the top. FIG. 6 shows a picture during encoding and the contents of the reference lists (List0 and List1) for it. When a P-picture (or I-picture) is encoded as shown in FIG. 6, the reference lists (List0 and List1) are updated to delete the oldest pictures from the reference lists (List0 and List1). In this example, the reference list (List1) holds only one picture. This is because a larger number of pictures referred to for backward prediction require a larger buffer amount till decoding. In other words, backward pictures excessively distant from a picture during encoding are not referred to.
In this example, I- and P-pictures are referred to, and all I- and P-pictures are sequentially added to the reference lists (List0 and List1). Only P-pictures are used in the reference list (List1) for backward prediction because this picture arrangement is considered to be the most popular one. However, the picture arrangement in the reference list is merely an example of the most popular one, and H.264 itself has a high degree of freedom for the configuration of the reference list.
For example, not all I- and P-pictures need be added to the reference list, and B-pictures can also be added to the reference list. H.264 defines even a long-term reference list of pictures which stay in the reference list until an explicit instruction is received. FIG. 7 shows a change of the reference list when adding B-pictures to the reference list. When adding B-pictures to the reference list, encoded pictures may be added to the reference list every time all B-pictures are encoded.
A file format for recording moving image data compressed in this way will be explained.
As described above, the MP4 (MPEG4) film format is used as a general-purpose format for recording MPEG (MPEG2 or MPEG4 format) image data obtained by a digital video camera, digital still camera, or the like. The MP4 file format ensures compatibility with other digital devices to, for example, play back image data recorded as an MP4 file.
As represented by a of FIG. 8, an MP4 file is basically formed from an mdat box which holds encoded stream image data, and a moov box which holds stream image data-related information. The mdat box is formed from a plurality of chunks (chunk cN), as represented by b of FIG. 8. Each chunk is formed from a plurality of samples (sample sM), as represented by d of FIG. 8. For example, the respective samples sample s1, sample s2, sample s3, sample s4, . . . correspond to encoded MPEG image data I0, B−2, B−1, P3, . . . , as represented by e of FIG. 8.
I0, I1, I2, . . . , In represent intra-encoded (intra-frame-encoded) frame image data. B0, B1, B2, . . . , Bn represent frame image data encoded (inter-frame-encoded) by referring to reference image data bidirectionally. P0, P1, P2, . . . , Pn represent frame image data encoded (inter-frame-encoded) by referring to reference image data unidirectionally (forward direction). These frame image data are variable-length encoded data.
As represented by c of FIG. 8, the moov box is formed from an mvhd box which holds header information recording the creation date and time, and the like, and a trak box which holds information on stream image data stored in the mdat box. Information stored in the trak box includes an stco box which stores information of an offset value for each chunk of the mdat box, as represented by h of FIG. 8, an stsc box which stores information of the number of samples in each chunk, as represented by g of FIG. 8, and an stsz box which stores information of the size of each sample, as represented by f of FIG. 8.
The amounts of data stored in the stco box, stsc box, and stsz box increase together with the recorded image data amount, that is, the recording time. For example, when an image of 30 frames per sec is recorded as an MP4 file by storing every 15 frames in one chunk, the data amount increases to 1 Mbyte for 2 h, requiring a moov box having a capacity of 1 Mbyte.
When playing back this MP4 file, the moov box of the MP4 file is read out from the recording medium, the stco, stsc, and stsz boxes are analyzed from the moov box. After that, each chunk in the mdat box can be accessed.
When recording an image in the MP4 file format, the stream data increases over time. Since the size of stream data is very large, the stream data needs to be written in the file even during recording. However, the size of the moov box also increases in accordance with the recording time, as described above. The size of the MP4 header is not defined till the end of recording, so the write offset position of stream data in the file cannot be determined. For this reason, recording by a general moving image processing apparatus adopts the following measures using the flexibility of the MP4 file format.
(1) The mdat box is arranged at the start of a file, and after recoding ends, the moov box is arranged next to the mdat box (a of FIG. 9).
(2) As proposed in Japanese Patent Laid-Open No. 2003-289495, the size of the moov box is determined in advance to determine the offset position of the mdat box, and then recoding is done (b of FIG. 9). Even when the recording time is short and the header area does not become full, the area remains as a free box. When recording data over the header size, the data is recorded by properly decimating frame number information of I-pictures, maintaining the header size at a predetermined size.
(3) A pair of moov and mdat boxes is divided into a plurality of pairs to arrange them (c of FIG. 9). The second and subsequent header areas are called moof boxes.
These are the structures of general MP4 files.
A general playback method for the MP4 file will be described below.
FIG. 10 is a block diagram showing an example of the basic arrangement of a moving image playback apparatus which plays back a moving image compression-encoded by H.264.
In FIG. 10, the moving image playback apparatus includes a recording medium 1001, a playback circuit 1002 which plays back data from a recording medium, a buffer circuit 1003, a variable-length decoding circuit 1004, an inverse quantization circuit 1005, an inverse DCT circuit 1006, an addition circuit 1007, a memory 1008, a motion compensation circuit 1009, a switching circuit 1010, a rearrangement circuit 1011, an output terminal 1012, a header information analysis circuit 1013, a playback control circuit 1014, and a control signal input terminal 1015.
The sequence of playback processing in the moving image playback apparatus in FIG. 10 will be explained.
Upon receiving an instruction from the playback control circuit 1014, the playback circuit 1002 plays back an MP4 file recorded on the recording medium 1001, and starts supplying it to the buffer circuit 1003. At the same time, the playback control circuit 1014 controls the header information analysis circuit 1013 to analyze an offset, chunk information, and sample information in the stco box, stsc box, and stsz box representing storage statuses in mdat in the moov box. The playback control circuit 1014 controls the playback circuit 1002 to start playing back stream image data in the mdat box from the recording medium 1001.
The playback circuit 1002 plays back, from the start address, the stream image data in the mdat box of the file recorded on the recording medium 1001, and supplies it to the buffer circuit 1003. Read of the stream image data stored in the buffer circuit 1003 starts in accordance with the occupancy of the buffer circuit 1003 and the like. The stream image data is supplied to the variable-length decoding circuit 1004. The variable-length decoding circuit 1004 executes variable-length decoding of the played-back stream image data supplied from the buffer circuit 1003, and supplies the decoded stream image data to the inverse quantization circuit 1005.
The inverse quantization circuit 1005 inversely quantizes the stream image data which is supplied from the variable-length decoding circuit 1004 upon variable-length decoding. The inverse quantization circuit 1005 supplies the inversely quantized stream image data to the inverse DCT circuit 1006. The inverse DCT circuit 1006 executes inverse DCT for the inversely quantized data supplied from the inverse quantization circuit 1005, and supplies the inverse DCT data to the addition circuit 1007. The addition circuit 1007 adds the inverse DCT data supplied from the inverse DCT circuit 1006, and data supplied from the switching circuit 1010.
Of stream image data played back from the recording medium 1001, intra-frame-encoded data I0 of GOP0 (Group Of Picture) is played back first, as shown in FIG. 11. The playback control circuit 1014 controls to select the terminal a of the switching circuit 1010, and the switching circuit 1010 supplies data “0” to the addition circuit 1007. The addition circuit 1007 adds data “0” supplied from the switching circuit 1010, and inverse DCT data supplied from the inverse DCT circuit 1006, and supplies the added data as a played-back frame F0 to the memory 1008 and rearrangement circuit 1011. The memory 1008 stores the added data supplied from the addition circuit 1007.
Bidirectionally predictive-encoded picture data B−2 and B−1 are played back next to the intra-frame-encoded data I0 of GOP0. The playback sequence up to the inverse DCT circuit 1006 is the same as that described for the intra-frame-encoded data I0, and a description thereof will not be repeated.
The inverse DCT circuit 1006 supplies bidirectionally predictive-encoded inverse DCT image data to the addition circuit 1007. At this time, the playback control circuit 1014 controls the switching circuit 1010 so that the movable terminal c of the switching circuit 1010 selects the fixed terminal b. Data from the motion compensation circuit 1009 is supplied to the addition circuit 1007.
The motion compensation circuit 1009 detects a motion vector which has been generated in encoding from played-back stream image data and recorded in the stream image data. The motion compensation circuit 1009 reads out data of a reference block (in this case, only data from the played-back intra-frame-encoded data F0 because recording has just started) from the memory 1008, and supplies it to the movable terminal c of the switching circuit 1010.
The addition circuit 1007 adds inverse DCT data supplied from the inverse DCT circuit 1006 and motion-compensated data supplied from the switching circuit 1010. The addition circuit 1007 supplies the added data as played-back frames F−2 and F−1 to the rearrangement circuit 1011.
Then, unidirectionally predictive-encoded picture data P3 is played back. The playback sequence up to the inverse DCT circuit 1006 is the same as that described for the intra-frame-encoded data I0, and a description thereof will not be repeated.
The inverse DCT circuit 1006 supplies unidirectionally predictive-encoded inverse DCT picture data to the addition circuit 1007. At this time, the playback control circuit 1014 controls the switching circuit 1010 so that the movable terminal c of the switching circuit 1010 selects the fixed terminal b. Data from the motion compensation circuit 1009 is supplied to the addition circuit 1007.
The motion compensation circuit 1009 detects a motion vector which has been generated in encoding from played-back stream image data and recorded in the stream image data. The motion compensation circuit 1009 reads out data of a reference block (data from the played-back intra-frame-encoded data F0) from the memory 1008, and supplies it to the movable terminal c of the switching circuit 1010.
The addition circuit 1007 adds inverse DCT data supplied from the inverse DCT circuit 1006, and motion-compensated data supplied from the switching circuit 1010. The addition circuit 1007 supplies the added data as a played-back frame F3 to the memory 1008 and rearrangement circuit 1011. The memory 1008 stores the added data supplied from the addition circuit 1007.
Then, pictures B1 and B2 are played back. These pictures are not frames at the start of recoding, and thus are played back by the same sequence as that described for the above-mentioned pictures B−2 and B−1 except that they are played back from the frames F0 and F3 by bidirectional prediction. In the above-described way, P6, B4, B5, . . . are played back sequentially.
The rearrangement circuit 1011 rearranges the sequentially played-back frames F0, F−2, F−1, F3, F1, F2, F6, F4, F5, . . . into F−2, F−1, F0, F1, F2, F3, F4, F5, F6, . . . , and outputs the rearranged frames to the output terminal 1012.
At the start of playing back the file, the header information analysis circuit 1013 analyzes an offset, chunk information, and sample information from the stco box, stsc box, and stsz box representing storage statuses in mdat in the moov box of the MP4 file. Thus, the playback control circuit 1014 operates to skip data till GOP1 and start playing back data from GOP1.
In a lens-interchangeable digital camera, when the lens is detached from the camera body, mote floating in air may enter the camera body. The camera incorporates various mechanical units such as a shutter mechanism which mechanically operate. When these mechanical units operate, dust such as metal powder may be generated in the camera body.
When a foreign substance such as dust or mote adheres to the surface of an optical low-pass filter or the like arranged in front of an image sensor which forms the image capturing unit of a digital camera, the shadow of the foreign substance is contained in a captured image, deteriorating the quality of the sensed image.
To solve this problem, the shadow of a foreign substance is corrected. As a technique applicable to the correction, for example, Japanese Patent Laid-Open No. 2003-289495 proposes an image defect correction method of correcting the pixel defect of an image sensor.
Japanese Patent Laid-Open No. 6-105241 proposes a method for simplifying setting of position information of a pixel defect. More specifically, the extension of an image file recorded in the dust acquisition mode is changed from that of a normal image, and the PC automatically discriminates a dust information image. By using this information, a target image is corrected. Some products record the dust information as photographing information in a recorded image file, and correct a target image using the information.
Japanese Patent Laid-Open No. 2004-242158 discloses a related technique.
However, the capacity of memory used to perform dust correction increases when playing back a moving image file like the above-described MP4 file while correcting a target image based on the dust information. In addition, the moving image playback quality deteriorates owing to low operating speed.
In still image playback, dust correction suffices to be executed once per image to play back a dust-corrected still image. Even if the dust correction processing time is long under the limitation of the memory or the like or the dust correction processing itself takes a long time, playback of a still image can wait till the completion of the dust correction processing.
However, in moving image playback, the motion of an image is expressed by continuously playing back a plurality of still images such as 15 or 30 frames per sec. In addition to general playback processing, processing to correct dust of one frame needs to be executed 15 times for 15 frames per sec or 30 times for 30 frames per sec. Further, the processing to correct dust of one frame is executed by the dust count.
More specifically, for 15 frames, the dust correction processing count per sec isdust correction processing count=15 frames×dust countFor 30 frames,dust correction processing count=30 frames×dust count
No natural moving image can be played back unless the series of processes ends within the 1-sec limited time.