Digital video is the most common and a very popular form in which videos are recorded today. One drawback of digital video is that it require large amounts of storage space on hard drives and a lot of bandwidth in the networks. This is particularly the case for surveillance video in which a lot of video is recorded during vast time frames. In connection with the present disclosure, digital video refers to digital motion video which uses much more data than digital still images. In order to decrease the amount of data needed to represent a digital video, plenty of compression schemes have been developed, e.g. H.262, H.264, H.265, and MPEG. However, decreasing the amount of data used for representing digital video is subject to continuously ongoing research.
Video recordings and video streams, compressed or uncompressed, are represented by sequences of image frames which are recorded and played back a specified frequency. This frequency is often referred to as frame rate or fps (frames per second). One development in the area of digital motion video compression has been to introduce variable frame rate. Variable frame rate is to be understood as the frame rate for a digital video being varied during the duration of the recorded video. For example, during a time period with very little movement and/or action in the scene, e.g. small changes in captured images, the frame rate may be adjusted to a very low value, e.g. 1 fps, and then when the camera identifies movement and/or action in the scene, e.g. large or rapid changes in captured images, the frame rate may be adjusted to a value for capturing of the movement, e.g. 30 fps. Hence, variable frame rate potentially saves a lot of bandwidth and/or storage space. In particular in monitoring or surveillance type scenarios where a lot of recordings to not include any movement at all.
Common encoding schemes such as MPEG encoding of various types, H.264, H.265, etc. employs an encoding scheme in which some frames are spatially encoded, i.e. encoded based on information in the frame itself, and other frames are temporally encoded, i.e. encoded based on changes in image elements or objects in relation to a previous image frame or frames and/or a later image frame or frames. Spatially encoded image frames are referred to as intra frames and temporally encoded image frames are referred to as inter frames. Many of the encoding schemes refers to the intra frames as I-frames and refers to the inter frames as P- or B-frames. P-frames are related to a previous frame, i.e. in order to be decoded it rely on information of a previously decoded image frame. B-frames are related to both a previous frame and a future frame. An image stream or an image file may be described as a sequence of intra frames and inter frames. Hereinafter will both image stream streamed to a device and a video file stored and distributed from a storage device may be referred to as an image stream. The image stream has to start with an intra frame in order to have a complete image which the inter frames can depend on. The intra frame is then followed by one or a plurality of inter frames until another intra frame is present in the image stream. One intra frame and the following inter frames, which are preceding the next intra frame, is referred to as a GOP (Group of Pictures). Hence the structure may be depicted like in the two different structural examples shown below (I=Intra frames P&B=Inter frames):                IPPPPBPPPPIPPPPBPPPPIPPPPBPPPPIPPPPBPPPP . . .        IPPPPPPPPPPPPPPPPPPIPPPPPPPPPPPPPPPPPP . . .        
Another development in this area has been to introduce a technique called variable GOP-length (Group of Pictures). The GOP-length is defined as one intra frame added to the number of inter frames until the next intra frame, in the examples above the GOP-length is 10 and 19, respectively. Usually one of the intra frames is included in the GOP-length. The basic idea behind variable GOP-length is based on the fact that an intra-frame requires a lot more data than an inter-frame and that the required storage space or bandwidth will be substantially reduced if the intra frames are less frequent, i.e. longer GOP-length. However, there is a drawback with longer GOP-lengths. The drawback is that the greater number of inter-frames the more artefacts are introduced in the video due to the inter-frames relying on information aggregated from previous inter-frames and, thus, artefacts in these previous inter-frames are also aggregated. One of the objects of the intra-frame is to reset these artefacts by providing a non-dependent image frame including the entire image frame. Similar to the use of variable frame rate the long GOP-length may be used during periods of low or no action in order to save storage space and/or bandwidth. Then, in order to avoid too much artefacts in the recorded video the GOP-length is shortened when action and/or movement is present in the scene.
When a user or an operator want to access a recorded video at a specific temporal location, i.e. a specific time in the recording, the playback function should start playing from the specific temporal location. However, often this temporal location is represented by an inter-frame and because an inter-frame relies on earlier frames in order to display correct image information the playback function has to find an earlier intra-frame and then decode all frames in between the intra-frame and the inter-frame at the specific temporal location. This operation of starting a playback function at the random location in a recorded video is sometimes referred to as a trick play. Other common trick play functionalities are fast forward and reverse playback. One way of implementing trick plays like these is described in the international patent application WO00/22820. In this application the random access of a specified temporal location is implemented so that the process after identifying the selected frame finds the I-frame, which in this case corresponds to the previously discussed intra-frame, using information stored in an auxiliary file including an offset of the I-frame. When the I-frame has been found the I-frame and subsequent P-frames, which in this case corresponds to the previously discussed inter-frames, will be decoded but not displayed until the selected frame is decoded. Another way of implementing the trick play such as random access to a time point in video recording is described in the international application WO 97/30544. In this application the target frame at the specified time point in the recorded video is identified using a video frame index being an array of offset numbers indicating at what byte each picture starts and whether the picture is an I-, P- or B-frame. Then the earlier frames that the requested frame is depending on is parsed.
These methods are complicated and requires a lot of changes to a basic playback function not already implementing a random access playback function.