Field of the Invention
The present invention relates to an image processing apparatus, an image processing method, and a storage medium, and more particularly relates to a technique using a temporal hierarchy identifier.
Description of the Related Art
High Efficiency Video Coding (HEVC), as an encoding method for compressing and recording a moving image, employs scalable movie encoding as an extension specification. In the scalable movie encoding, a moving image is hierarchically encoded from low quality to higher quality. The scalable video encoding may be classified into a spatial scalability, a temporal scalability, and a signal to noise ratio (SNR) scalability, based on the type of hierarchical target information. The temporal scalability is a technique for hierarchizing in accordance with the change in a temporal scale, that is, the number of frames per unit time (frame rate) in the image encoding. The frame rate can be adjusted by partially extracting data having the hierarchal structure. More specifically, the moving image is encoded in such a manner that a plurality of frame rates can be taken, whereby the frame rates can be flexibly switched in accordance with limitations differing depending on environments such as network transmission and reproduction (decoding) processing.
To achieve the hierarchal encoding corresponding to the temporal scalability described above, the encoding defined in HEVC is performed with a temporal hierarchy identifier (temporal ID) set to each frame in a moving image. The temporal ID is information for identifying each temporal hierarchy. A frame in each hierarchy can be reproduced by referring to a frame corresponding to a value equal to or smaller than a set temporal ID value. Thus, the reproducing (decoding and displaying) is performed by selecting a temporal hierarchy based on the temporal ID.
The relationship between the temporal ID and the frame rate of a moving image that can be selectively reproduced is described below with reference to FIG. 9A. In FIG. 9A, a frame including an intra frame (I frame), a predicted frame (P frame), and a bi-directional predicted frame (B frame) is divided into four hierarchies. In FIG. 9A, the frames in each of the highest to the lowest hierarchies are respectively appended with temporal IDs=3, 2, 1, and 0. By selecting the frame that has been thus appended with the temporal ID and encoded at the time of transmission and reproduction based on the temporal ID, moving images can be formed in four types of frame rates in FIG. 9A. When only the temporal ID=0 (a frame group 904 in FIG. 9A) is selected, the frame rate is 7.5 frames per second (FPS). When the temporal IDs=0 and 1 (frame groups 903 and 904 in FIG. 9A) are selected, the frame rate is 15 FPS. When the temporal IDs=0, 1 and 2 (frame groups 902 to 904 in FIG. 9A) are selected, the frame rate is 30 FPS. When the hierarchies corresponding to all the temporal IDs=0 to 3 (frame groups 901 to 904 in FIG. 9A) are selected, the frame rate is 60 FPS. As described above, a frame rate can be selected on a reception side based on the temporal ID when reproducing a moving image.
A technique for controlling a frame rate on a transmission side is also available in which each frame in a moving image is appended with a priority of processing among frames and the transmission is performed based on the priority (Japanese Patent No. 3519722). In Japanese Patent No. 3519722, a priority of processing related to each frame is appended based on a prediction format (hereinafter, referred to as a frame type) such as an intra-picture reference frame (hereinafter, referred to as an I frame), an inter-picture reference frame (hereinafter, referred to as a P frame), and a bidirectional inter-picture reference frame (hereinafter, referred to as a B frame). The priority level is set based on a dependence relationship among frames used as predictive (reference) images. More specifically, since the I frame may be referred to by both the P and the B frames, the I frame has the highest priority among the three frame types. On the other hand, the B frame is never used as the reference image and thus has the lowest priority. The P frame may be referred to by the B frame and has an intermediate priority lower than the I frame and higher than the B frame.
In the technique discussed in Japanese Patent No. 3519722, a bit rate control is performed based on a transmission condition of a communication path by temporarily thinning the frames (reducing the frame rate) in accordance with the priority appended to each frame. More specifically, the frames with a priority lower than a threshold are thinned out and frames with a priority higher than or equal to threshold are transmitted in accordance with the transmission condition (that is, an effective bit rate) of the communication path. The frames to be transmitted are selected by using the threshold in accordance with the priority of each frame and the transmission condition of the communication path, for example, (1) all the frames, (2) only the frames with [priority: high] (I frame) and with [priority: intermediate] (P frame), or (3) only the frames with [priority: high] (I frame) are elected.
As described above, in Japanese Patent No. 3519722, a transmission frame rate is controlled in such a manner that a frame with a lower priority is cut off (thinned out) based on the priority appended to each frame type and the transmission condition of the communication path, when the effective transmission rate might possibly be exceeded. The number of priorities is limited based on the number of the frame types.
Thus, in a case where the method for selecting the frame rate on the transmission side so that moving image data as a result of frame rate control is reproduced based on the temporal ID, as discussed in Japanese Patent No. 3519722, the following problem arises. For example, suppose the B frame is in a hierarchy corresponding to the temporal ID=1, and the priorities are set to the frame types, such as the I frame [priority: high], the P frame [priority: intermediate], and the B frame [priority: low]. In this case, a group of B frames in the hierarchy corresponding to the temporal ID=1 has a priority lower than a group of P frames in the hierarchy corresponding to the temporal ID=2, and thus may be preferentially thinned out when the frames are transmitted in the method discussed in Japanese Patent No. 3519722. Therefore, when the B frames with the temporal ID=1 are thinned out, a frame group 912 cannot be normally reproduced at 30 FPS as illustrated in FIG. 9B.
Furthermore, frames 914 to 917 in a frame group 911 refer to the B frames in the thinned out frame group 912 as illustrated in FIG. 9B, and thus cannot be reproduced. As described above, when referring to the thinned frames with the temporal ID=1, the frames with temporal ID=2 cannot be reproduced, and thus the frames in the frame group 911 cannot be normally reproduced at 60 FPS as illustrated in the FIG. 9B.
As described above, it is difficult to control moving image data on which the temporal scalability encoding is performed based on the temporal ID, at a desired bit rate or frame rate, in a case where the method discussed in Japanese Patent No. 3519722 is used.