Video surveillance is wide spread for security applications. Modern surveillance networks may operate in a diverse range of locations. Examples include public spaces (including streets and parks), public and private buildings, infrastructure (such as railway stations), airports, factories, military bases and other locations.
It is well known to monitor video surveillance manually, one or more operators monitoring a number of video terminals, for example. It is difficult for an operator to monitor many terminals at once, however, or even a single terminal, to obtain information or identify incidents that might require action. Various systems have been proposed, therefore, to automate some aspects of video surveillance in order to supplement or replace manual operators. These include automatic face recognition systems, systems for automatically identifying incidents (such as alarms) and responding (see PCT/AU2014/00098, the disclosure of which is incorporated herein by reference), and other automated or partially automated systems.
One requirement of video systems surveying people, is to be able to make a count or estimate of the numbers of people. For example, numbers of people in a hotel lobby, railway station or airport, to determine that, for example, the number of people is not over a requisite amount. In crowded locations this is a very difficult (if not impossible) job for a manual operator, even where that manual operator is dedicated purely to the task of counting the number of people.
Systems are known that use overhead 2-dimensional cameras to detect the heads of people in the field of view for the purpose of people counting. Prior art systems use motion detection and “blob” extraction in each frame. Blobs are searched for circles due to the fact that in a top-down view, heads resemble circles. Alternative methods look at body shape to recognise people. All such methods suffer from the usual 2-dimensional computer-vision challenges of carrying out detection in varying light conditions and in occluded scenes. Moreover, known circle-detection algorithms can misfire in scenes with different types of flooring (tile/carpet patterns), and so such methods and systems are not robust.
Depth sensors and 3-dimensional cameras are now available which can provide information on depth. These may be usable to facilitate detection of heads. There is still an issue on accurate and resolvable head detection, however, where other artefacts may come into the depth field being analysed from the 3D information, such as bags being carried on shoulders, hands being raised at head level or above head level, or other objects at that depth level interfering with the head count.
Other requirements for surveillance include monitoring the behaviour of people, objects, and animals. Also to monitor the behaviour of individuals in groups. Traditional methods of using two-dimensional computer vision suffer from a number of challenges, such as differentiating light or thermal changes.
In the case of non-motion detection of objects, people, etc. (see Australian Patent 2002342393, the disclosure of which is incorporated herein by reference) light and thermal shadows can cause false alarms.