Digital video cameras and the processing of digital media data recorded by such cameras have become commonplace in recent years. The video data and often audio data recorded by such cameras are typically written to a digital media container, such as the AVI container format or MP4 container format. These container formats allow the video and audio data, and in many cases other data such as subtitles and still images, to be stored in a digital media file, but also allows the data to be live broadcasted, i.e. streamed over the Internet.
Digital media containers are used to identify and interleave different data types, and comprise a plurality of portions including a payload (or data) portion and one or more metadata portions. The payload (or data) portion includes the media data, typically with each of the data types, e.g. video, audio, etc, in the container being interleaved (or multiplexed). The one or more metadata portions contain data about the container and the media data (or content) contained therein. For example, the one or more metadata portions can include data such as: the number of streams (or tracks), e.g. video, audio, etc; the format of each stream, e.g. the type of compression, if any, used to encode each stream; and the duration of the media data; all of which are required to read the data in the container and to subsequently provide the content. The one or more metadata portions can also include information about the content in the container, such as a title, an artist name, etc. Digital media containers typically have a hierarchical structure, with the one or more metadata portions often being positioned at the start of the container. This is not always the case, however, and in some instances one or more metadata portions can be positioned at the start of the container, and one or more other metadata portions can be positioned at the end of the container.
In the case of the MP4 container, each of the portions of the container are typically referred to as ‘atoms’. The payload portion of a MP4 container is called the mdat atom, and the metadata portions include the moov atom that acts as the index for the container and defies the timescale, duration, display characteristics of the media data in the container, and information for each track in the container, and often one or more uuid atoms, or so called user defined atoms. The moov atom is required to be accessed before it becomes possible to play the media content in a MP4 container, and the position of the moov atom is therefore typically dependent on the manner in which the container is going to be delivered, e.g. progressive download, streaming or local playback. For local playback, the position of the moov atom in the container is not important, since the entire file is available immediately. Accordingly, the moov atom will typically be found at the end of the container, as this can be beneficial since the data and thus size of the moov atom is not known until the media data has been added to the container. However, for progressive download or streaming, if the moov atom were to be positioned at the end of the container, then the entire file is required to be downloaded before it can be played (or a second communication channel, separate from a communication channel used to stream the media content of the file, is needed to obtained the moov atom). Accordingly, in such instances it is desirable for the moov atom to be positioned at the start of the container.
An overview of certain digital media processing techniques will now be described, with reference to FIGS. 1, 2 and 3.
A first technique is that of writing a digital media file, often generally called “encoding”, and is shown in FIG. 1. Uncompressed (or raw) media data, also known as streams, and which can include video frames recorded by a video camera and audio packets recorded by a microphone, is obtained and encoded into a compressed format. Compression reduces the size of the data stream by removing redundant information, and can be lossless compression or lossy compression; lossless compression being where the reconstructed data is identical to the original, and lossy compression being where the reconstructed data is an approximation to the original, but not identical. For example, the video stream can be compressed using the H.264 compression format, and the audio stream can be compressed using the AAC compression format. Once the streams have been encoded, they are multiplexed, also referred to as “muxing”, in which the streams are combined into a single stream. The multiplexed stream can then be written to the payload (or data) portion of a file, and after the recording has stopped the file is closed by updating and/or adding the relevant one or more metadata portions to the file. Alternatively, the multiplexed stream can be streamed over a network, rather than being written to a file.
A second technique is that of reading a digital media file, often generally called “decoding”, and is shown in FIG. 2. This technique is essentially the reverse of the “encoding” shown in FIG. 1, and involves demultiplexing the streams that are contained in the file based on information in one or more metadata portions of the file. Each of the demultiplexed streams can then be decoded from their compressed format, again based on information in one or more metadata portions of the file, and the video frames, audio packets, etc can then be played.
A third technique is that of transcoding, and is shown in FIG. 3. Transcoding is the process of demultiplexing and decoding the streams in a digital media file, and then re-encoding and re-multiplexing some or all of the data in the streams to generate a new digital media file. Transcoding is typically performed to convert a file from one type to another type, or to change the compression formats used to encode the media data in the file, or to change format parameters of the media data, such as frame rate, resolution.
Digital video cameras that use such digital media processing techniques, either on the camera itself or on associated editing software for use on computing devices, such as desktop or laptop computers, smartphones and the like, are increasingly being used in outdoors and sports settings. Such video cameras, which are often referred to as “action cameras” are commonly attached to a user, sports equipment or a vehicle and are operated to capture video data, and typically also audio data, during a sports session with minimal user interaction.
It is also known to integrate a number of additional sensor devices into such action cameras. For example, WO 2011/047790 A1 discloses a video camera comprising some or all of an integrated GPS device, speed or acceleration measuring device, time measuring device, temperature measuring device, heart rate measuring device, barometric altitude measuring device and an electronic compass. These sensors can be integrated in the camera itself, or can be remote from the camera and operably connected to the camera using a wired or wireless connection. It is further described that the data from these additional sensor devices, i.e. sensor data, can be stored separately from the digital media file containing the recorded video and audio data, but also that the sensor data can be stored in the same digital media file as the recorded video and audio data, such as by storing the sensor data in the payload (or data) portion of the media file. In this latter case, the sensor data is multiplexed with the video and audio data, and can, for example, be stored in the subtitle track of the media file.
WO 2011/047790 further discloses that the sensor data can be added as a digital overlay over the video data when it is played and displayed on a display device, such that to the viewers can see, for example, the changing speed, acceleration, position, elevation, etc of the user or their equipment simultaneously with the video. It is also disclosed that such digital overlays can be integrated permanently into the video data through a transcoding process, such that the recorded video can then be uploaded to a video sharing site, such as YouTube®.
While such techniques are advantageous in their own right, the Applicants believe that there remains scope for improvements to techniques for processing video image data, and in particular to techniques for processing integrated video image and sensor data.