Field of the Invention
The present invention relates to the storage of image data, such as still images, bursts of still images, compositions or cropping of images or video data in a media container with descriptive metadata. Such metadata generally provides easy access to the image data and portions of the image data.
Description of the Related Art
Some of the approaches described in this section could be pursued, but are not necessarily approaches that have been previously conceived or pursued. Therefore, the approaches described in this section are not necessarily prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
The HEVC standard defines a profile for the encoding of still images and describes specific tools for compressing single still images or bursts of still images. An extension of the ISO Base Media File Format (ISOBMFF) used for such kind of image data has been proposed for inclusion into the ISO/IEC 23009 standard, in Part 12, under the name: “Image File Format”. The standard covers two forms of storage corresponding to different use cases:                the storage of image sequences, with timing that is optionally used at the decoder, and in which the images may be dependent on other images, and        the storage of single images, and collections of independently coded images.        
In the first case, the encapsulation is close to the encapsulation of the video tracks in the ISO Base Media File Format (see document «Information technology—Coding of audio-visual objects—Part 12: ISO base media file format», ISO/IEC 14496-12:2012, Fourth edition, September 2012), and the same tools and concepts are used, such as the ‘trak’ boxes and the sample grouping for description. The ‘trak’ box is a file format box that contains sub boxes for describing a track, that is to say, a timed sequence of related samples.
In the second case, a set of ISOBMFF boxes, the ‘meta’ boxes are used. These boxes and their hierarchy offer less description tools than the ‘track’ boxes and relate to “information items” or “items” instead of related samples.
The image file format can be used for locally displaying multimedia files or for streaming multimedia presentations. HEVC Still Images have many applications which raise many issues.
Image bursts are one application. Image bursts are sequences of still pictures captured by a camera and stored as a single representation (many picture items referencing a block of data). Users may want to perform several types of actions on these pictures: select one as thumbnail or cover, apply effects on these pictures or the like.
There is thus a need for descriptive metadata for identifying the list of pictures with their corresponding bytes in the block of data.
Computational photography is another application. In computational photography, users have access to different resolutions of the same picture (different exposures, different focuses etc.). These different resolutions have to be stored as metadata so that one can be selected and the corresponding piece of data can be located and extracted for processing (rendering, editing, transmitting or the like).
With the increase of picture resolution in terms of size, there is thus a need for providing enough description so that only some spatial parts of these large pictures can be easily identified and extracted. Various arrangements of image spatial parts can then produce new images through composition and/or cropping.
Another kind of applications is the access to specific pictures from a video sequence, for instance for video summarization, proof images in video surveillance data or the like.
For such kind of applications, there is a need for image metadata enabling to easily access the key images, in addition to the compressed video data and the video tracks metadata.
In addition, professional cameras have reached high spatial resolutions. Videos or images with 4K2K resolution are now common. Even 8k4k videos or images are now being common. In parallel, video are more and more played on mobile and connected devices with video streaming capabilities. Thus, splitting the videos into tiles becomes important if the user of a mobile device wants to display or wants to focus on sub-parts of the video by keeping or even improving the quality. By using tiles, the user can therefore interactively request spatial sub-parts of the video.
There is thus a need for describing these spatial sub-parts of the video in a compact fashion in the file format in order to be accessible without additional processing other than simply parsing metadata boxes. For images corresponding to the so-described videos it is also of interest for the user to access to spatial sub-parts. As well, for images resulting from cropping and/or composition of these spatial sub-parts, it is also of interest for the user to access these pictures.
The ISO/IEC 23008 standard covers in its part 12 two ways for encapsulating still images into the file format that have been recently discussed.
One way is based on ‘track’ boxes, and the notion of timed sequence of related samples with associated description tools, and another is based on ‘meta’ boxes, based on information items, instead of samples, providing less description tools, especially for region of interest description and tiling support.
There is thus a need for providing tiling support in the new Image File Format.
The use of tiles is commonly known in the prior art, especially at compression time. Concerning their indexation in the ISO Base Media File format, tiling descriptors exist in drafts for amendment of Part 15 of the ISO/IEC 14496 standard “Carriage of NAL unit structured video in the ISO Base Media File Format”.
However, these descriptors rely on ‘track’ boxes and sample grouping tools and cannot be used in the Still Image File Format when using the ‘meta’ based approach. Without such descriptors, it becomes complicated to select and extract tiles from a coded picture stored in this file format.
FIG. 1 illustrates the description of a still image encoded with tiles in the ‘meta’ box (100) of ISO Base Media File Format, as disclosed in MPEG contribution m32254.
An information item is defined for the full picture 101 in addition to respective information items for each tile picture (102, 103, 104 and 105). The box (106), called ‘ItemReferenceBox’, from the ISO BMFF standard is used for indicating that a ‘tile’ relationship (107) exists between the information item of the full picture and the four information items corresponding to the tile pictures (108). Identifiers of each information item are used so that a box (109), called ‘ItemLocationBox’, provides the byte range(s) in the encoded data (110) that represent each information item. Another box “ItemReferenceBox’” (112) is used for associating EXIF metadata (111) with the information item for the full picture (101) and a corresponding data block (111) is created in the media data box (110). Also, an additional information item (113) is created for identifying the EXIF metadata.
Even if the full picture and its tiles are introduced as information items, no tiling information is provided here. Moreover, when associating additional metadata with an information item (like EXIF), no data block referenced using an additional ItemReferenceBox’ is created.
Reusing information on tiling from EXIF and reusing the mechanism defined in the Still Image File format draft wouldn't make it possible to describe non-regular grid with existing EXIF tags.
Thus, there is still a need for improvements in the file format for still images, notably HEVC still images. In particular, there is a need for methods for extracting a region of interest in still Images stored with this file format. The invention lies within the above context.