Heretofore, a video monitoring system is installed in facilities such as hotels, buildings, convenience stores, financial institutions, dams, and roads, on which an unspecified large number of people visit, in order to suppress crimes, prevent accidents, and so on. Such a video monitoring system is a system in which a person who is a target for surveillance, for example, is shot using an imaging apparatus such as a camera, the shot picture is transmitted to a monitoring center such as a management office or a security office, and a permanently stationed guard monitors the picture, and calls attention or records the picture depending on purposes or as necessary.
In recording pictures in the video monitoring system, such cases are increasing in which a random access medium typified by a hard disk drive (HDD) is used for a recording medium instead of a conventional video tape medium. Moreover, in these years, an increase in the capacity of the recording medium is advancing. The volume of recordable pictures is dramatically increasing because of an increase in the capacity of the recording medium, and recording at many points and recording for a long time are being enabled. On the other hand, it is beginning to come to the surface as a problem in that a burden is increased to visually check recorded images.
Because of such a background, a video monitoring system is becoming popular, which includes a retrieval function for more easily finding a desired picture. More specifically in these years, a system including a more sophisticated retrieval function is coming in which the occurrence of a specific event in pictures is automatically detected in real time using an image recognition technique, the event is recorded together with a picture, and the event can be retrieved afterward. One of typical systems is a similar face image retrieval system including a similar face image retrieval function.
The similar face image retrieval function is a function that can retrieve a person entering a surveillance picture as an event and retrieves the appearance of a particular individual specified by a user in the event using the image feature value of the face.
FIG. 1 is a diagram of an exemplary configuration of a video monitoring system including a previously existing similar face image retrieval function. In the following, “the video monitoring system including the similar face image retrieval function” is referred to as “a similar image retrieval system”. The system illustrated in FIG. 1 is configured in which an imaging apparatus 101, a recording apparatus 102, a retrieval apparatus 103, and a terminal device 104 are connected to a network 150 in the state in which the apparatuses and the device can communicate with one another.
The network 150 is a communication line such as an exclusive line, an intranet, the Internet, and a wireless LAN (Local Area Network) that connects the apparatuses and the device to one another for data communication.
The imaging apparatus 101 is a device such as a network camera and a surveillance camera that takes an image using a device such as a CCD (Charge Coupled Device) and a CMOS (Complementary Metal Oxide Semiconductor), subjects the image to picture processing such as white balance to generate image data, and outputs the image data to the network 150.
The recording apparatus 102 is a device such as a network digital recorder that records the image data inputted through the network 150 on a recording medium such as a HDD and outputs the image data recorded on the recording medium to the network 150 in response to a request from an external device.
The retrieval apparatus 103 is a device such as a server and a PC (Personal Computer) that detects a face in the image data inputted through the network 150, records information about the face on a recording medium such as a HDD, searches for information about the face recorded on the recording medium in response to a request from an external device, and outputs the search result to the network 150.
The terminal device 104 is a device such as a desktop PC that displays the image data and the search result inputted through the network 150 on the screen of a monitor such as a liquid crystal display and a CRT (Cathode Ray Tube), includes a keyboard, a mouse, and the like, and provides a manipulation interface for the manipulation of reproduction of recorded images and the manipulation of searching for a person.
An example of the configuration and processing operation of the previously existing retrieval apparatus 103 will be described with reference to FIG. 2. FIG. 2 is a diagram of an exemplary configuration of the previously existing retrieval apparatus 103. The retrieval apparatus 103 is configured of a face registration processing group 221, a face retrieval processing group 222, and a face feature value database 205.
The face registration processing group 221 is configured of an image input unit 201, a face detecting unit 202, a face feature value calculating unit 203, and a face feature value recording unit 204. Moreover, the face retrieval processing group 222 is configured of an image input unit 211, a face detecting unit 212, a face feature value calculating unit 213, a face feature value searching unit 214, and a search result output unit 215.
In FIG. 2, in the process at the face registration processing group 221, the image input unit 201 performs a process for receiving surveillance image data inputted from the imaging apparatus 101 and the recording apparatus 102. The surveillance image data, that is, the image data of a search target image is inputted constantly, or separately at a time when an instruction is made, or when a setting is established. The image input unit 201 outputs the inputted image data to the face detecting unit 202.
The face detecting unit 202 performs a process for detecting a face from the image data inputted from the image input unit 201 and outputting the face detection result. Here, the face detection result means information about the presence or absence of a face in the image. In the case where a face exists, the face detection result also includes the number of faces detected, the position coordinates of a face region in the image, a face image, and so on. A face is detected by an image recognition technique such as a method for searching the inside of an image using the characteristics of a face such as the disposition of main components of a face including eyes, a nose, and a mouth and the difference in shades between the forehead and the eyes, for example. Any methods may be used in this example. The face image means an image cut in a rectangular shape including a face out of the image data from the image input unit 201 and having an aspect ratio. Desirably, the background other than a face is filled in a predetermined color. The face detecting unit 202 outputs the face detection result to the face feature value calculating unit 203.
The face feature value calculating unit 203 performs a process for calculating the feature value of the face using the face image included in the face detection result inputted from the face detecting unit 202 and outputting the calculated face feature value. Here, the face feature value means vectors, including the frequency distribution of the outlines or edge patterns of fragmented faces, the size and shape of the main components of the face such as eyes, a nose, and a mouth, the relationship of the disposition between the main components, the color distribution of hairs and the skin, and the combination of them, for example. Any types and any numbers of components may be used for the feature value.
For the calculation process for the face feature value, the methods disclosed in Patent Literature 3 and Non-patent Literature 1 are used, for example. The face feature value calculating unit 203 repeats processing according to the number of faces detected in the inputted face detection result. The face feature value calculating unit 203 outputs the calculated face feature value to the face feature value recording unit 204 together with the face detection result.
The face feature value recording unit 204 performs a process for writing the image data, the face detection result, and the face feature values inputted from the face feature value calculating unit 203 on the face feature value database 205. The processing unit repeats processing according to the number of faces detected in the inputted face detection result.
Subsequently, in the process at the face retrieval processing group 222, the image input unit 211 receives search key image data inputted from the terminal device 104. The image input unit 211 receives the input of the search key image data in the case where a request for search is made based on a search instruction manipulation made by a user on the terminal device 104. The image input unit 211 outputs the inputted image data to the face detecting unit 212.
The face detecting unit 212 detects a face from the image data inputted from the image input unit 211. The face detection result calculated at the face detecting unit 212 and the calculating method are information about the presence or absence of a face in the image similarly to the face detection result calculated at the face detecting unit 202. The face detecting unit 212 outputs the calculated face detection result to the face feature value calculating unit 213 together with the image data.
The face feature value calculating unit 213 performs a process for calculating the feature value of the face by the same method as the face feature value calculating unit 203 using the face image included in the face detection result inputted from the face detecting unit 212, and outputs the calculated face feature value. The face feature value calculating unit 213 outputs the calculated face feature value to the face feature value searching unit 214 together with the image data and the face detection result.
The face feature value searching unit 214 checks the face feature value inputted from the face feature value calculating unit 213 against the face feature value database 205, makes a list of faces with higher face similarities, and outputs the list as the search result. The face similarity means a numeric value that expresses the proximity between face feature values. In the case where a Euclidean distance in a multi-dimensional face feature value space is used for face similarity, for example, the similarity means that a smaller value of similarity (a value closer to zero) has a higher similarity, and this is expressed as “a high similarity”. The face feature value searching unit 214 outputs the search result to the search result output unit 215. This search result includes the face feature value of the found face, the similarity, the image data, and so on.
The search result output unit 215 outputs the search result inputted from the face feature value searching unit 214 to the terminal device 104.
Techniques for the calculation process for the face feature value are described also in Patent Literature 1 and Patent Literature 2, for example, in addition to Non-patent Literature 1 described above.