Automatically detecting the formation and presence of a crowd and understanding its behavior are important to many applications in military, security, and commercial environments. It is stated in the Proceedings of the 2005 Conference on Behavior Representation in Modeling and Simulation “All military operations, large or small, have a crowd control/crowd confusion factor. Crowds are one of the worst situations you can encounter. There is mass confusion; loss of control and communication with subordinates; potential for shooting innocent civilians, or being shot at by hostiles in the crowd; potential for an incident at the tactical level to influence operations and policy at the strategic level.”
While some investigations into sensors that are effective in estimating crowds, (e.g., radar) have been conducted, traditional video-based surveillance combined with computer vision techniques are often used to address the problem. Video surveillance systems are currently the most promising and widely used technique to model and monitor crowds. Surveillance cameras are low cost due to the economies of scale from the security industry, portable, and passive and require no physical contact with the subject being monitored.
Accurate crowd modeling is a challenging problem because environments and groups of people vary and are unpredictable. Many efforts have been conducted to detect crowds using video input. Those approaches have focused on modeling background scenes and extracting foreground objects in order to estimate crowd locations and density. However, a background modeling approach has been proven to be less than reliable for modeling lighting, weather, and camera-related changes. Therefore, the type of foreground object of interest is usually limited, especially for crowd modeling, where humans are of interest. Human objects have unique characteristics regardless of environment. Information theory also states that the more information collected, the better the decision that can be made. Good examples of this theory is found in the recent work of Hoiem and Efros that provides a good example of this theory as stated by the article D. Hoiem, A. A. Efros, and M. Hebert, “Putting Objects in Perspective”, in IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2006.
When considering human crowd monitoring, current state of the art intelligent video surveillance systems are primarily concentrated on crowd detection. A successful system that interprets crowd behavior has not been developed. On the other hand, crowd behavior has been studied in-depth in psychology. Information about crowd behavior can be found in books and articles. Examples include, Turner, Ralph, and Lewis M. Killian. Collective Behavior 2d ed. Englewood Cliffs, N.J.: Prentice Hall, 1972; 3d ed., 1987; 4th ed., 1993; Rheingold, Howard, Smart Mobs: The Next Social Revolution, 2003; Mc Phail, Clark, The Myth of the Madding Crowd, New York, Aldine de Gruyter, 1991; Canetti, Elias (1960). Crowds and Power. Viking Adult. ISBN 0670249998.
Musse and Thalmann propose a hierarchical crowd model. A description of this model is found in, S. R. Musse and D. Thalmann, “Hierarchical model for real time simulation of virtual human crowds”, IEEE Transactions on Visualization and Computer Graphics, Vol. 7, No. 2, April-June 2001, pp. 152-164 which is incorporated by reference. According to their model, a crowd is made up of groups and groups are made up of individuals. A crowd's behavior can be understood and anticipated through understanding the group's behavior and in turn the inter-group relationships. Nguyen and McKenzie set up a crowd behavior model by integrating a cognitive model and a physical model, see Q. H. Nguyen, F. D. McKenzie, and M. D. Petty, “Crowd Behavior Cognitive Model Architecture Design”, Proceedings of the 2005 Conference on Behavior Representation in Modeling and Simulation, Universal City Calif., May 16-19, 2005. The cognitive model models the “mind” for the crowd. The model receives stimuli from the physical model, processes the stimuli, and selects a behavior to send back to the physical model to carry out. Therefore, the physical model acts as the “body” of the crowd and is responsible for accepting behaviors as input, carrying out those behaviors, and then outputting stimuli to the cognitive model. A crowd is characterized by its size, mood, leadership structure, demographics, and propensity for violence. Nguyen and McKenzie used a crowd aggression level to characterize the state of the crowd. FIG. 10 shows a simplified behavior model as described by Nguyen and McKenzie.
While detecting and estimating the crowd is a difficult problem, understanding and detecting the crowd behavior and crowd mood is similarly challenging. The user of the system usually does not want to have a fixed behavior system, such as over-crowd detection, long waiting line detection, or people loitering detection. Although, this sounds straightforward, current implemented systems are always fixed for certain behaviors, or a set of behaviors. Each crowd behavior or crowd mood detection uses different algorithms, which makes it impossible for combining the behaviors into new behaviors. What is needed is a configurable system that provides the capability of a user to define the type of crowd behavior to be detected. The ability to let a user define behaviors in different levels of details provides tremendous flexibility and accuracy in detection. Further, what is needed is a processing method that provides accurate and computationally efficient crowd behavior detection while being able to handle wide variations in crowds and lighting environments.