Conventionally, surveillance has been conducted with use of surveillance cameras installed in buildings. For example, a warden monitors monitor windows which display videos sent from the surveillance cameras, and records video images of a suspicious individual or the like, if any.
In such surveillance, when the suspicious individual goes out of a monitor window, i.e. goes out of a shooting zone of a single surveillance camera, the warden manually designates the monitor windows one by one to record the video images of the suspicious individual. With such operations, only video images displayed in the designated monitor window can be recorded, and it is possible to track the suspicious individual when playing back the recording.
Meanwhile, encoding techniques that use the inter frame prediction based on correlation among pictures have been developed to reduce the data amount of the videos. In recent years, the videos sent from the surveillance cameras are encoded with use of such techniques.
Encoding techniques that use the prediction include, for example, the MPEG (Moving Picture Experts Group)-2 standard, and the H.264/MPEG-4 AVC (Advanced Video Codec) which aims to realize a higher compression rate. Since these techniques are based on correlation among pictures, they basically require another picture to refer to (hereinafter called “reference picture”) in order to decode a certain picture.
Therefore, if it is when the warden designates a monitor window that the recording of the video data sent from the surveillance cameras starts, the pictures immediately after the designation can not be played back in some cases. In other words, the reference pictures of the video data might not be recorded in some cases, and the pictures immediately after the designation can not be decoded due to the lack of the reference pictures.
The following explains the reason why some pictures can not be decoded immediately after the designation, with reference to FIG. 3.
In this explanation, it is assumed that the user switches from a camera unit 01 to a camera unit 02, and compressed data is accumulated in an accumulation unit.
The rectangles represent the pictures, and the characters “I”, “P” and “B” in the rectangles represent the picture types “I picture”, “P picture” and “B picture” respectively. In this example, the I pictures have been encoded with use of the intra frame prediction, and the P pictures and the B pictures have been encoded with use of the inter frame prediction.
The rectangles illustrated in full line represent pictures sent by the camera unit 01, and the rectangles illustrated in dotted line represent pictures sent by the camera unit 02. The shaded rectangles (e.g. picture 111) represent pictures that can not be properly decoded.
The arrows represent the reference relationships between the pictures. For example, the P picture assigned with the display order “1” of the camera unit 01 refers to the I picture assigned with the display order “0” (See the arrow 100). The B picture 101 assigned with the display order “7” of the camera unit 02 refers to the I picture 102 assigned with the display order “5” and the P picture assigned with the display order “8”.
Here, assume the switching to the camera unit 02, which takes place when the picture assigned with the display order “6” of the camera unit 01 has been accumulated.
The accumulation unit at the bottom of the drawing stores the pictures from the camera unit 01 until the display order “6”. From the display order “7”, the accumulation unit stores the pictures from the camera unit 02. When playing back the pictures accumulated in the accumulation unit, it is impossible to properly decode the picture 111, because the reference picture 102 is not accumulated in the accumulation unit (See the arrow 110). The playback can not be performed properly until the next I picture 112 comes.
There have been suggestions of a technique to avoid the interruptions in the video of the suspicious individual.
One of such suggestions is a technique to reduce the delay time in switching among TV cameras (Patent Document 1). In brief, a TV camera, which has detected the switching, firstly generates encoded picture data using the intra prediction which does not require any reference pictures, and starts the transmission with the generated picture.
If applying this technique to the surveillance cameras, the picture encoded with use of the intra prediction will be transmitted immediately after notification of the switching. Therefore, it is possible to receive and record the picture which does not require any reference pictures immediately after the switching. As a result, it is possible to start the playback of the subsequent pictures. Note that it is assumed here that the pictures after the switching do not refer to the pictures before the switching.
Another one of the suggestions mentioned above is a technique to predict the motion of the subject based on the motion vectors generated in the case the video data is encoded (Patent Document 2). If applying this technique to the surveillance cameras, it is possible to avoid partially missing the pictures of the moving subject, by recording both the video data from the surveillance camera that is currently shooting the subject and the video data from the surveillance camera that is expected to shoot the subject next.    Patent Document 1: Japanese Laid-open Patent Application Publication No. 7-327228    Patent Document 2: Japanese Laid-open Patent Application Publication No. 2000-32435