1. Field of the Invention
The present invention relates to an image encoding apparatus, a method of controlling this apparatus and a program for executing such control.
2. Description of the Related Art
Digital video cameras in which image data is encoded by inter-frame encoding and recorded on a recording medium such as tape, optical disk, or flash memory, etc., have proliferated in recent years. Conventionally, MPEG-2 and MPEG-4 are used as methods of inter-frame encoding of image data. Another method employed recently is the recording and/or playback of image data using H.264/AVC, which has a better encoding efficiency than MPEG-2 and MPEG-4.
The encoding schemes such as MPEG-2, MPEG-4, and H.264/AVC perform encoding on a per-block basis, where a block includes several pixels in one frame. As illustrated in FIG. 5, pixels in horizontal and vertical directions are collected together in units of 8×8 pixels each (the unit is referred to as a “DCT block”), and a plurality of these DCT blocks are collected together to form a macroblock. A luminance signal is composed of four DCT blocks, and two color-difference signals are composed of one DCT block each, for a total of six DCT blocks. A plurality of macroblocks are collected together to form a slice. The MPEG-2 scheme performs encoding in units of the above-mentioned frame, slice, macroblock, and DCT block. FIGS. 4A and 4B illustrate encoding in MPEG-2 and H.264/AVC, respectively.
First, FIG. 4A illustrates which type of encoding each image frame is performed along a time axis t in MPEG-2. The horizontal axis is the time axis. For example, an image K0 represents a frame image at time t0.
The image K0 at time t0 is encoded by intra-frame encoding [I-frame (K0′)]. Next, an image K3 at time t3 is encoded by inter-frame encoding with reference to an image obtained by locally decoding the image K0'which was encoded by intra-frame encoding [P-frame (K3′)].
Next, an image K1 at time t1 is encoded by inter-frame encoding with reference to the image obtained by locally decoding the image K0′ which was encoded by intra-frame encoding and an image obtained by locally decoding the image K3′ which was encoded by inter-frame encoding [B-frame (K1′)]. Next, an image K2 at time t2 is encoded by inter-frame encoding with reference to the image obtained by locally decoding the image K0′ which was encoded intra-frame encoding and an image obtained by locally decoding the image K3′ which was encoded by inter-frame encoding [B-frame (K2′)].
Frame images K6, K9, K12 are thus prediction-encoded as P-frames using the immediately preceding I-frame or P-frame in the past as a reference. That is, one-directional predictions are performed (K6′, K9′, K12′). Further, frame images K4, K5, K7, K8, K10, K11, K13, K14 are prediction-encoded from an immediately preceding or immediately succeeding I-frame or P-frame in the past or future. That is, bi-directional predictions are performed (K4′, K5′, K7′, K8′, K10′, K11′, K13′, K14′).
Further, according to MPEG-2 as illustrated in FIG. 4A, images K−2 to K12 are handled upon being collected together as a GOP (Group of Pictures) so as to include at least one frame that has been obtained by intra-frame encoding. The beginning of the encoded GOP is an I-frame.
In the H.264/AVC scheme, inter-frame encoding and intra-frame encoding are performed in slice units, as illustrated in FIG. 4B. A slice s−1 of an image F(n0) at time tn0 is encoded by bi-directional prediction encoding (B-slice). A B-frame in MPEG-2 has a frame encoded as an I- or P-frame immediately preceding or immediately succeeding this frame as its reference frame, but the B-slice may take its reference frame from any past or future frame, as illustrated in FIG. 4B.
The B-slice s−1 of image F(n0) in FIG. 4B is a mixture of two types of macroblocks. The first is a macroblock for which the reference images are images F(q), F(q−2) at past times tq and tq−2, respectively. The second is a macroblock for which the reference images are images F(q−1), F(s) at a past time tq−1 and future time ts, respectively. Further, two types of macroblocks are mixed also in I-slice s−2 of image F(n0) at time tn0. The first is a macroblock that is prediction-encoded with an image F(q−3) at a past time tq−3 serving as the reference image, and the second is a macroblock that is prediction-encoded with an image F(s+1) at a future time ts+1 serving as the reference image.
In a system in which generated encoded data is recorded on a medium, an IDR (Instantaneous Decoding Refresh) frame is inserted for the purpose of assuring playback in a case where an image is played back from some midpoint on the medium. The IDR frame is a frame capable of being decoded on its own, and another slice cannot refer to the slice of a frame that is beyond the IDR frame.
In other words, each slice between IDR−0 and IDR−1 shown in FIG. 4B must not refer to a frame farther in the past than IDR−0 and must not refer to a slice farther in the future than IDR−1. Further, a slice between IDR−0 and IDR−1 is not referred from a slice farther in the past than IDR−0, and a slice between IDR−0 and IDR−1 is not referred from a slice farther in the future than IDR−1.
In a case where an attempt is made to display an image by searching for encoded data that has been recorded on a recording medium, an I-frame (an IDR frame if the scheme is H.264/AVC) whose playback on its own is assured is read from the recording medium, decoded, played back, and displayed. However, when the I-frame is played back and displayed, frames with blank intervals in terms of time are displayed. Depending upon the search speed, therefore, the picture is not readily updated and will lead to a search image that appears odd.
In order to solve this problem, a technique has been proposed in which, in the case of MPEG-2, three I-frames or P-frames from the beginning of the GOP are acquired from a medium, decoded, and displayed (see the specification of Japanese Patent Application Laid-Open No. 10-322661).
Further, a technique has been proposed in which search data that is capable of being played back on its own is prepared with regard to multiple stages of fixed search magnifications, the data is recorded on a tape-like medium at trace positions for each of the respective magnifications, and the search data is played back when a search is conducted (see the specification of Japanese Patent Application Laid-Open No. 2002-33990).
With the technique proposed by Japanese Patent Application Laid-Open No. 10-322661, however, several P-frames are played back from the initial I-frame. As a result, the odd appearance of the search image still remains, i.e., the image initially moves a little, stops for a while and then moves a little.
Further, with the technique proposed by Japanese Patent Application Laid-Open No. 2002-33990, search data conforming to search magnification can be obtained. However, data for trickplay must be prepared and the amount of stored data increases. Furthermore, this proposal is a technique premised upon a tape-like medium and is not suited to a disk-like medium, etc.
Furthermore, in encoding according to H.264/AVC, there is no P-frame and the search image must rely upon the IDR frame, as described above.
Accordingly, the present invention provides an encoding technique in which, at search mode playback, at least some frames other than an IDR frame can be decoded based on an IDR frame, the picture can be updated and, as a result, it is possible to improve the quality of a search image.