Video surveillance becomes ubiquitous in modern life, and is used to serve residential security, transportation security, hygiene security, societal security, and production security. For reducing storage costs of large amount of surveillance video or adapting to the need for low bit rate transmission, reduction in encoder bit rates is often sought while good video quality is maintained.
For conventional encoders, encoded frames often include three types of frames: I Frames, P Frames, and B Frames. FIG. 1 is an example diagram showing a reference relationship among the three frame types. An I Frame is an intra encoded frame which has no dependency on other frames, and may be decoded into a complete picture by itself, without reference to any other frames. An I Frame is a key frame for video decoding to provide video data. A P Frame is a forward prediction frame, and often refers to a preceding frame. During encoding, either an intra-frame-prediction mode can be used, or inter-frame prediction can be carried out by referring to a preceding P Frame or a preceding I Frame. For example, considering movement characteristics, inter-frame compression is carried out based on redundant time field information. A B Frame is a bidirectional prediction frame which can refer to a preceding frame and can also refer to a following frame. Therefore, a P Frame and a B Frame both need to refer to other frames, and have dependency on other frames. A P Frame or a B Frame cannot be decoded into a complete picture solely by itself. Usually, an I Frame together with a P frame and a related B frame are jointly referred to as a Group of Pictures (GOP).
In some circumstances, random access or video playback is needed for an encoded bit stream (e.g., a video recording file). If a desired access or playback start time is T and a frame corresponding to T is recorded as F, the decoding process begins with an I Frame which is closest to Frame F in order to access or playback Frame F. The decoding time from the I Frame to Frame F corresponds to a wait time. If a GOP length is overly long, then an average wait time for random access or video playback may be long, which affects the random access efficiency. Therefore, the GOP length of conventional encoders is often relatively short (e.g., 25 frames, 50 frames, etc.) and usually does not exceed 100 frames.
Currently, the surveillance industry broadly utilizes digital video recorders (DVRs) as video storage equipment. A DVR is a type of surveillance equipment that can record video and audio data using a data hard drive, such as a regular digital hard drive video recorder (e.g., interfacing with analog video cameras), a hybrid DVR, and a network video recorder (NVR). Surveillance video often needs to be stored for a relatively long period of time, and thus the storage equipment often has a relatively large storage capacity. To facilitate storage and access, a large capacity storage area is further divided into sub-storage areas of similar sizes (e.g., “fragments”). Video data is stored to different fragments. FIG. 2 is an example diagram showing a large capacity storage area being divided into “fragments.”
FIG. 3 is an example flow chart of a conventional video data storage process. As shown in FIG. 3, during the process of video data storage, a storage equipment opens a data buffer zone used for temporarily storing a latest segment of video encoding data. Upon the receipt of a recording command, the storage equipment checks whether an I frame exists in the current buffer zone. If no I frame exists in the current buffer zone, the storage equipment waits for an I Frame to appear, and then, from a start position of the I Frame, accumulates data in the buffer zone. Once the data in the buffer zone reaches a certain size, for example, 512 kilobytes, then the data is written once to the hard drive. Prior to actually saving the buffer zone video data to a current fragment, the storage equipment judges whether a sum of the data volume in the current fragment and the data volume in the buffer zone exceeds a threshold value (e.g., the size of the current fragment). If the sum does not exceed the threshold value, the video encoding data in the data buffer zone is stored to the current fragment. On the other hand, if the sum exceeds the threshold value, a new fragment is opened and the video encoding data in the data buffer zone is stored to the new fragment.
To facilitate future random access and video playback of the video data, cross-fragment reference of video data often needs to be avoided, which means that reference relation between encoded frames across different fragments is to be avoided. Thus, data in each fragment usually begins with an I Frame. This way, one or more P Frames and/or one or more B Frames which follow the I Frame have reference relation with the I Frame in the current fragment only and are unrelated to the video data in the previous fragment. However, when the data in the current buffer zone is to be stored to a new fragment, the current buffer zone may not have an I Frame, or the data in the current buffer zone may not begin with an I frame. Thus, storing the data in the current buffer zone at this juncture to the new fragment makes it hard to ensure that the new fragment begins with an I Frame. The conventional approach is to discard the portion of the bit stream (i.e., data in the current buffer zone), and then send a command to an encoder to force-encode an I Frame. Then, data begins with the I Frame is stored to the new fragment. The conventional approach guarantees that the new fragment begins with the I Frame, but it causes bit stream loss for a period of time.
The above-noted conventional approach for video data storage may be suitable and effective for bit streams output of a traditional encoder, because time intervals between I Frames in the bit streams output by the traditional encoder are usually not very long and the number of frames is not large. However, if the time intervals between I Frames in the bit streams output by an encoder are very long (e.g., several minutes or even several hours), the number of frames of a GOP video is very large. This means that the data volume of the GOP is large, and the proportion of I Frames in the buffer zone is very low. If the data in the buffer zone needs to be stored to a new fragment and the buffer zone does not include an I Frame or does not begin with an I frame, at least a portion of the data in the buffer zone may be discarded and force-encoding of an I Frame may be performed, which may destruct the bit stream structure of long GOP encoding. In addition, not all encoders support force-encoding of I Frames. Therefore, the above-noted conventional approach for video surveillance data storage may be suitable in some circumstances where a GOP length is relatively short, but may not be satisfactory for video surveillance data storage when a GOP length is long.
Hence it is highly desirable to improve the techniques for video storage.