1. Field of the Invention
The present invention relates to, in an imaging apparatus which uses an image sensor such as a charge-coupled device (CCD) and a complementary metal-oxide semiconductor, a technology for reducing image degradation caused by a foreign substance which is adhered to a surface of an optical low pass filter and the like arranged on a front face of the image sensor. More particularly, the present invention relates to a technology for reducing image degradation caused by a foreign substance when capturing a moving image.
2. Description of the Related Art
Recently, there is a need for a technology which can encode moving image information into digital data at a high compression ratio with a high image quality for use in processing, storage, and transmission thereof. For compression of image information, methods such as Moving Picture Experts Group (MPEG), which compresses and encodes image information by orthogonal transformation such as a discrete cosine transform and by movement prediction/movement compensation utilizing redundancy specific to moving image information, have been proposed and widely spread.
Manufacturers develop and commercialize imaging apparatuses, such as digital cameras and digital video cameras, or Digital Versatile Disk (DVD) recorders, which can record images by utilizing these coding formats. Further, a user can easily view an image using these apparatuses, a personal computer, a DVD player, and the like.
Recently, MPEG4-Part 10 AVC (H.264) has been developed as a coding format aiming for an even higher compression ratio and higher image quality. Compared with conventional coding formats such as MPEG2 and MPEG4, although H.264 requires a greater amount of calculations for the coding and decoding, H.264 is known to be able to achieve a higher coding efficiency.
FIG. 1 illustrates a configuration of an image processing apparatus that compresses image data by the H.264 format.
In FIG. 1, input image data is divided into macro blocks, which are sent to a subtraction unit 101. The subtraction unit 101 calculates a difference between the image data and a predicted value, and outputs this difference to an integer Discrete Cosine Transform (DCT) transformation unit 102. The integer DCT transformation unit 102 transforms the input data using the integer DCT, and outputs the transformed data to a quantization unit 103. The quantization unit 103 quantizes the input data.
One part of the quantized data is sent to an entropy coding unit 115 as difference image data. The other part of the quantized data is inverse-quantized by a dequantization unit 104, and then subjected to inverse integer DCT transformation by an inverse integer DCT transformation unit 105. The data subjected to inverse integer DCT transformation is added with a predicted value by an addition unit 106. As a result, the image is restored.
One part of the restored data is sent to a frame memory 107 for intra (intra-frame) prediction. The other part of the restored data is subjected to deblocking filter processing by a deblocking filter 109, and then the resultant data is sent to a frame memory 110 for inter (inter-frame) prediction.
The image in the frame memory 107 for intra prediction is used for an intra prediction by an intra prediction unit 108. In this intra prediction, a value of an adjacent pixel of an already coded block in the same picture is used as the predicted value.
The image in the frame memory 110 for inter prediction is constituted by a plurality of pictures, as is described below. The plurality of pictures are divided into two lists, a “list 0” and a “list 1”. The plurality of pictures divided into the two lists are used in an inter prediction by an inter prediction unit 111.
After the inter prediction is performed, the internal image is updated by a memory controller 113. In the inter prediction performed by the inter prediction unit 111, a predicted image is determined using an optimum movement vector based on a movement detection result performed by a movement detection unit 112 on image data from a different frame.
As a result of the intra prediction and the inter prediction, the optimum prediction is selected by a selection unit 114. Further, the movement vector is sent to the entropy coding unit 115, and is coded along with the difference image data. As a result, an output bit stream is formed.
The H.264 format inter prediction will now be described in more detail using FIGS. 2 to 5.
In the H.264 format inter prediction, a plurality of pictures can be used for the prediction. Therefore, to specify a reference picture, two lists (the “List 0” and the “List 1”) are prepared. Each list is assigned with a maximum of five reference pictures.
For P pictures, forward direction prediction is mainly performed using only the “List 0”. For B pictures, bidirectional prediction (or, only forward direction or backward direction prediction) is performed using the “List 0” and the “List 1”. Namely, in the “List 0”, pictures mainly for forward direction prediction are assigned, and in the “List 1”, pictures mainly for backward direction prediction are assigned.
FIG. 2 illustrates an example of a reference list used during coding. This example will be described using an example of a case where the ratio between the P pictures and the B pictures is typical. Namely, this example is described for the case where I pictures have a 15 frame interval, the P pictures have a 3 frame interval, and there are two B picture frames between the P pictures.
In FIG. 2, pieces of image data 201 are lined up in a display order. In the rectangles of image data 201, numerals representing the picture type and display order are written.
For example, picture I15 is an I picture which is fifteenth in the display order, and only intra prediction is performed. Picture P18 is a P picture, which is eighteenth in the display order, and only forward direction prediction is performed. Picture B16 is a B picture, which is sixteenth in the display order, and bidirectional prediction is performed.
The order in which coding is performed is different from the display order, and is the order in which prediction is performed. Namely, in FIG. 2, the order in which prediction is performed is “I15, P18, B16, B17, P21, B19, B20, . . . ”.
Further, in FIG. 2, restored pictures which have once undergone coding are contained in a reference list (List 0) 202. For example, when performing inter prediction with a picture P21 (twenty-first P picture in the display order), restored pictures for which coding is already finished are referenced in the reference list (List 0) 202. In the example illustrated in FIG. 2, pictures P06, P09, P12, I15, and P18 are contained in the reference list 202.
In the inter prediction, the movement vector having the optimum predicted value from among the reference pictures in this reference list (List 0) 202 is determined and coded for each macro block. The pictures in the reference list (List 0) 202 are provided with a reference picture number in order to distinguish them (the pictures are provided with a number different from that illustrated in FIG. 2).
When the coding of the picture P21 is finished, next the picture P21 is decoded and newly added to the reference list (List 0) 202. From the reference list (List 0) 202, the oldest reference picture (here, picture P06) is removed. Subsequently, the coding is performed on pictures B19 and B20, and continues on to picture P24. The state of the reference list (List 0) 202 at this stage is illustrated in FIG. 3.
FIG. 4 illustrates a changing state of the reference list for each picture. In FIG. 4, the pictures are coded in order from the top. FIG. 4 also illustrates the picture undergoing coding and the contents of the reference lists (List 0 and List 1) with respect to the picture undergoing coding.
As illustrated in FIG. 4, when the P picture (or I picture) is coded, the reference lists (List 0 and List 1) are updated, and the oldest picture in the reference lists (List 0 and List 1) is removed. In this example, the reference list (List 1) only has one picture.
This is because if the number of pictures to be referenced for backward direction prediction is increased, the buffer amount until the encoding is completed increases. Namely, this is to avoid referencing a backward direction picture that is very far away from the picture undergoing coding.
In this example, the pictures used in referencing are I pictures and P pictures. All of the I pictures and the P pictures are added in series to the reference lists (List 0 and List 1). Further, the pictures used in the reference list (List 1) for backward direction prediction are only the I pictures.
That is because this is the most commonly used picture configuration. However, the picture configuration in such reference lists is merely an example of what is probably most commonly used. H.264 itself has a higher degree of freedom in its reference list configuration.
For example, it is not necessary to add all of the I pictures and the P pictures to the reference lists. The B pictures may also be added to the reference lists. Further, a long duration reference list that keeps staying in the reference list until it is explicitly instructed to be removed is also defined.
FIG. 5 illustrates a changing state of the reference list when B pictures are added to the reference lists. When adding B pictures to the reference lists, generally, one way to do this is to add the coded picture each time when any of the B pictures is coded.
Next, a file format for recording the thus-compressed moving image data will be described. As described above, the MP4 (MPEG-4) file format is used as a multipurpose format for recording MPEG (MPEG-2 or MPEG-4 format) image data captured by a digital video camera, digital still camera and the like. By recording as an MP4 file, compatibility, such as being able to play back with another digital device, is assured.
As illustrated in FIG. 6A, MP4 files basically includes a “mdat box” which contains coded stream image data, and a “moov box” which contains information related to the stream image data. As illustrated in FIG. 6B, the “mdat box” includes a plurality of chunks (chunk cN). As illustrated in FIG. 6D, each chunk includes a plurality of samples (sample sM).
As illustrated in FIG. 6E, the respective samples are configured so that coded MPEG data of I0, B-2, B-1, P3 . . . corresponds to sample s1, sample s2, sample s3, sample s4 . . . .
Here, I0, I1, 12, . . . , In are pieces of frame image data which have been intra coded (intra-frame coded), B0, B1, B2, . . . , Bn are pieces of frame image data which have been inter coded (inter-frame coded) using bidirectional referencing, and P0, P1, P2, . . . , Pn are pieces of frame image data which have been referenced and coded (inter-frame coded) from a single direction (order direction). All of these pieces of data are variable length coded data.
As illustrated in FIG. 6C, the “moov box” is includes a “mvhd box” having header information in which the creation date and the like is recorded, and a “trak box” having information relating to the stream image data stored in the “mdat box”.
Examples of the information stored in the “trak box” include a “stco box” which stores information about an offset value for each chunk in the “mdat box” as illustrated in FIG. 6H, a “stsc box” which stores information about the sample number in each chunk as illustrated in FIG. 6G, and a “stsz box” which stores information about the size of each sample as illustrated in FIG. 6F.
Therefore, the data amount stored in the “stco box”, “stsc box”, and “stsz box” increases along with the amount of recorded image data, that is, the recording time.
For example, when an image of 30 frames per second is recorded as an MP4 file so that each of 15 frames is stored in one chunk, in two hours data of about 1 Mbyte is generated, which means that a “moov box” having a 1 Mbyte capacity is required.
When playing back this MP4 file, each of the chunks in the “mdat box” can be accessed by reading the “moov box” of the MP4 file from the recording medium, and analyzing the “stco box”, “stsc box”, and “stsz box” in that “moov box”.
When recording in the MP4 file format, the stream data increases with time. Further, since the size is so large, the stream data needs to be written into the file also during recording.
However, as described above, since the size of the “moov box” also increases according to the recording time, and since the size of the MP4 header is also unknown until recording is finished, the offset position for writing into the file of the stream data cannot be determined. Therefore, for recording in a typical moving image processing apparatus, the following procedures are performed taking advantage of the flexibility of the MP4 file format.
(1) The “mdat box” is arranged at the head of the file, and the “moov box” is arranged behind the mdat box when the recording is finished (FIG. 7A).
(2) As discussed in Japanese Patent Application Laid-Open No. 2003-289495, the size of the moov box is pre-determined, the mdat box offset position is determined, and recording is performed (FIG. 7B). Even when the recording time is short, and the header region has free capacity, that region is left as a free box. When performing recording more than the header size, the header size can be maintained in the pre-determined size by performing the recording so that frame number information of the I pictures is thinned as needed.
(3) A “moov box” and “mdat box” pair is arranged by dividing up into a plurality of pairs (FIG. 7C). Here, the second and subsequent header regions are referred to as a “moof box”. They are the common configurations of MP4 files.
A typical playback method of the MP4 files will now be described. FIG. 8 illustrates a basic example configuration of a moving image playback apparatus for playing back moving images that are compressed and coded by the H.264 format.
In FIG. 8, the moving image playback apparatus includes a recording medium 801, a playback circuit 802 for playing back data from the recording medium 801, a buffer circuit 803, a variable length decoding circuit 804, a dequantization circuit 805, an inverse DCT circuit 806, an addition circuit 807, a memory 808, a movement compensation circuit 809, a switch circuit 810, an arrangement circuit 811, an output terminal 812, a header information analysis circuit 813, a playback control circuit 814, and a control signal input terminal 815.
Next, a flow of the playback processing of the moving image playback apparatus in FIG. 8 will be described.
When the playback circuit 802 receives an instruction from the playback control circuit 814, the playback circuit 802 plays back the MP4 file recorded in the recording medium 801, and starts to supply the played back MP4 file to the buffer circuit 803. Simultaneously, the playback control circuit 814 controls the header information analysis circuit 813 so as to analyze the offset, chunk, and sample information received from the “stso box”, “stsc box”, and “stsz box” indicating a storage state in the mdat of the moov box.
Then, the playback control circuit 814 controls the playback circuit 802 so as to start playing back the stream image data in the mdat box from the recording medium 801.
The playback circuit 802 plays back the stream image data in the “mdat box” of the file recorded in the recording medium 801 from its head address, and supplies the stream image data to the buffer circuit 803. The playback control circuit 814 starts to read the stream image data stored in the buffer circuit 803 while looking at the occupancy state and the like in the buffer circuit 803, and supplies the stream image data to the variable length decoding circuit 804.
The variable length decoding circuit 804 performs variable length decoding on the played back stream image data supplied from the buffer circuit 803, and sends the resultant data to the dequantization circuit 805.
The dequantization circuit 805 dequantizes the stream image data supplied from the variable length decoding circuit 804 and on which variable length decoding is performed, and supplies the resultant data to the inverse DCT circuit 806.
The inverse DCT circuit 806 performs an inverse DCT on the dequantized data supplied from the dequantization circuit 805, and supplies the resultant data to the addition circuit 807. The addition circuit 807 adds data supplied from the switch circuit 810 to the data supplied from the inverse DCT circuit 806.
Here, in the stream image data played back from the recording medium 801, as illustrated in FIG. 9, first, I0 intra-frame coded in the Group Of Pictures (GOP) 0 is played back. Therefore, the playback control circuit 814 performs a control to select a terminal “a” of the switch circuit 810, and the switch circuit 810 supplies a piece of data “0” to the addition circuit 807.
The addition circuit 807 adds the piece of “0” data supplied from the switch circuit 810 and the piece of inverse DCT data supplied from the inverse DCT circuit 806, and supplies the resultant data to the memory 808 and the arrangement circuit 811 as a played back frame F0. The memory 808 stores the added data supplied from the addition circuit 807.
Bidirectionally prediction coded picture data B-2 and B-1 will be played back after the intra-frame coded I0 data of the GOP 0. However, since the playback processing up to the inverse DCT circuit 806 is the same as the playback processing described above for the intra-frame coded I0 data, the description thereof is omitted.
The inverse DCT bidirectionally prediction coded image data from the inverse DCT circuit 806 is supplied to the addition circuit 807. At this stage, the playback control circuit 814 controls the switch circuit 810 so that a movable terminal c of the switch circuit 810 selects a fixed terminal b, and supplies data from the movement compensation circuit 809 to the addition circuit 807.
The movement compensation circuit 809 detects the movement vector, which is generated during coding and recorded in the stream image data. Further, the movement compensation circuit 809 reads data from the reference block (in this case, since the recording is just starting, this is only the data from the played back intra-frame coded data F0) from the memory 808, and supplies the read data to the movable terminal c of the switch circuit 810.
The addition circuit 807 adds the inverse DCT transformed data supplied from the inverse DCT circuit 806 and the movement compensated data supplied from the switch circuit 810, and supplies the resultant data to the arrangement circuit 811 as played back frames F-2 and F-1.
Unidirectionally prediction coded picture data P3 will be played back next. However, since the playback processing up to the inverse DCT circuit 806 is the same as the playback processing described above for the intra-frame coded I0 data, the description thereof is omitted.
The inverse DCT coded picture data from the inverse DCT circuit 806 is supplied to the addition circuit 807. At this stage, the playback control circuit 814 controls the switch circuit 810 so that the movable terminal c of the switch circuit 810 selects the fixed terminal b, and supplies the data from the movement compensation circuit 809 to the addition circuit 807.
The movement compensation circuit 809 detects the movement vector, which is generated during cording and recorded in the stream image data, from the played back stream image data. Further, the movement compensation circuit 809 reads data in the reference block (the data from the played back intra-frame coded data F0) from the memory 808, and supplies the read data to the movable terminal c of the switch circuit 810.
The addition circuit 807 adds the inverse DCT data supplied from the inverse DCT circuit 806 and the movement compensated data supplied from the switch circuit 810, supplies the resultant data to the memory 808 and the arrangement circuit 811 as a played back frame F3. The memory 808 stores the addition data supplied from the addition circuit 807.
Pictures B1 and B2 are played back next. However, since the frames are not at the start of recording, the same processing as described above for B-1 and B-2 is used for playback, except that the frames are played back from frames F0 and F3 as the bidirectional prediction. Thus, as described above, P6, B4, B5, . . . are successively played back.
The arrangement circuit 811 arranges the successively-read frames F0, F-2, F-1, F3, F1, F2, F6, F4, F5, . . . in the order of F-2, F-1, F0, F1, F3, F4, F5, F6, . . . and outputs the arranged frames to the output terminal 812.
When starting playback of the above-described file, the header information analysis circuit 813 analyzes the offset, chunk, and sample information obtained from the “stso box”, “stsc box”, and “stsz box” indicating the storage state in the “mdat” of the “moov box” of the MP4 file. Therefore, the playback control circuit 814 skips to a GOP1, and works to start the next playback from the GOP1.
For an interchangeable-lens digital camera, when removing the lens from the camera body, dust or the like floating in the air can intrude into the interior of the camera body. Further, various mechanical parts that are mechanically operated, such as a shutter mechanism, are provided in the camera interior. When these mechanical parts are operated, particles such as metal particles can be produced in the camera body.
If a foreign substance such as these particles or dust adhere to a surface of the image sensor provided with an imaging unit of the digital camera, the foreign substance can be photographed as a shadow in the captured image, and thereby the quality of the captured image deteriorates.
To resolve such a problem, there is a correction method which uses signals output from the pixels surrounding the portion on which the shadow of the foreign substance is photographed. As a technology usable for correcting the shadow of the foreign substance, Japanese Patent Application Laid-Open No. 2003-289495 discusses an image defect correction method for correcting an image defect on the image sensor.
Further, Japanese Patent Application Laid-Open No. 6-105241 discusses, in order to simplify setting of position information of the image defect, a method for correcting a correction target image by making an extension of an image file captured in a dust acquisition mode to be different from that of a normal image, automatically determining a dust information image on the PC side, and using that information to correct the correction target image.
Further, there are also products which perform correction of the correction target image by recording the above-described dust information in the captured image file as photographic information, and subsequently using that information.
However, playing back a moving image file like the above-described MP4 file while correcting the correction target image based on the above dust information causes problems, such as an increase in the memory amount to be used, and deterioration in the quality of the moving image playback due to deterioration in the operating speed.
For still image playback, to play back the still image after dust correction, just one dust correction is needed for one image. Even if the dust correction processing takes time due to restrictions on memory and the like, since this is playback of a still image, the inconvenience of waiting until the dust correction processing is completed is small.
However, for moving image playback, movement of the images is expressed by continuously playing back a plurality of still images at, for example, 15 frames or 30 frames per second. Therefore, in addition to the normal playback processing, the dust correction processing has to be performed 15 times per second for 15 frames, and 30 times for 30 frames, which means that natural moving image playback cannot be performed unless the processing is finished in such a limited time.