In digital video systems, such as network camera monitoring systems, video sequences are compressed before transmission using various video encoding methods. In many digital video encoding systems, two main modes are used for compressing video frames of a sequence of video frames: intra mode and inter mode. In the intra mode, the luminance and chrominance channels are encoded by exploiting the spatial redundancy of the pixels in a given channel of a single frame via prediction, transform, and entropy coding. The encoded frames are called intra-frames, and may also be referred to as I-frames. The inter mode instead exploits the temporal redundancy between separate frames, and relies on a motion-compensation prediction technique that predicts parts of a frame from one or more previous frames by encoding the motion in pixels from one frame to another for selected blocks of pixels. The encoded frames are called inter-frames, and may be referred to as P-frames (forward-predicted frames), which can refer to previous frames in decoding order, or B-frames (bi-directionally predicted frames), which can refer to two or more previously decoded frames, and can have any arbitrary display-order relationship of the frames used for the prediction. Further, the encoded frames are arranged in groups of pictures, or GOPs, where each group of pictures is started with an I-frame, and the following frames are P-frames or B-frames. The number of frames in a group of pictures is generally referred to as a GOP length. GOP lengths may vary from 1, meaning that there is just an intra-frame, and no inter-frames, in a group of pictures, to, e.g., 255, meaning that there is one intra-frame followed by 254 inter-frames in a group of pictures. Since intra-frames generally require more bits for representation of an image than inter-frames, motion video having longer GOP lengths will generally produce a lower output bit rate than motion video having shorter GOP lengths.
At the site of reception of the encoded video sequence, the encoded frames are decoded. A concern in network camera monitoring systems is the available bandwidth for transmission of encoded video. This is particularly true in systems employing a large number of cameras. Further, this concern is especially important in situations where available bandwidth is low, such as when the video sequence is to be transmitted to a mobile device, e.g., a mobile phone, a PDA, or a tablet computer. An analogous problem occurs regarding storage of images, for instance when storing images on an on-board SD card in the camera. A compromise has to be made, where available bandwidth or storage is balanced against the interest of high quality video images. A number of methods and systems have been used for controlling the encoding in order to reduce the bit rate of transmissions from the cameras. These known methods and systems generally apply a bit rate limit, and control the encoding such that the output bit rate from the cameras is always below the bit rate limit. In this way, it may be ensured that the available bandwidth is sufficient, such that all cameras in the system may transmit their video sequences to the site of reception, e.g., a control center, where an operator may monitor video from the cameras of the system, and where video may be recorded for later use. However, applying a bit rate limit to all cameras may lead to undesirably low image quality at times, since the bit rate limit may require severe compression of images containing a lot of details, regardless of what is happening in the monitored scene. Generally, images of a scene with motion are of higher interest to an operator than images of a static scene. Still, when applying a bit rate limit, images with motion may need to be heavily compressed in order not to exceed the limit, thereby leading to low image quality. As mentioned above, another way of reducing the output bit rate is to use longer GOP lengths. However, since this implies that the frequency of intra-frames is decreased, errors caused by the predictions employed when encoding the inter-frames may propagate further, leading to annoying encoding artifacts in the displayed image.