If sensor networks could interpret the movement and activities of people within buildings, then the buildings could be safer and more efficient. Safety could be enhanced by a sensor network that is able to provide current census data to enhance adaptive evacuation plans, for example. Security would be enhanced by systems that can interpret daily activity patterns in buildings and flag unusual activities. Predicting the activities of inhabitants enables greater energy efficiency in heating, lighting, and elevator scheduling.
One network of sensors is described by Wilson et al., “Simultaneous tracking & activity recognition (star) using many anonymous, binary sensors,” The Third International Conference on Pervasive Computing, 2005. That network is targeted for a home where only a small number of people are present at any one time. Wilson applied a classic ‘track-then-interpret’ methodology.
However, an environment with more people, such as an office building, school or factory, requires exponentially more hypotheses that must be determined for tracking people and interpreting higher levels of activities. Therefore, that method was only applicable to low-census buildings, such as residential homes. Further, the exact placement of sensors in the home environment was essential. That level of specialization is not economical in large buildings, or where usage patterns change dynamically.
The prior art also describes methods for interpreting human activities from images in a video, Bobick, “Movement, activity and action: the role of knowledge in the perception of motion,” Philosophical Transactions: Biological Sciences, 352(1358): 1257-1265, 1997. Bobick described a framework for using time and context in video to interpret human behavior. He broke down behavior into a tripartite hierarchy consisting of movements, activities, and actions. The most basic activities were called movements. Movements have relation to the spatial context and no temporal structure. Short sequences of movements were combined with some temporal structure to form activities. The activities were interpreted within the larger context of the participants and the environment to recognize actions. However, Bobick's method requires cumbersome image analysis of video frames acquired by costly video cameras.
Other prior art describing the interpretation of human behavior in video includes Stauffer et al., “Learning patterns of activity using realtime tracking,” IEEE Transactions on Pattern Recognition and Machine Intelligence, 22(8):747-757, 2000; Johnson et al., “Learning the distribution of object trajectories for event recognition,” Image and Vision Computing, 14(8), 1996; Minnen et al., “Expectation grammars: Leveraging high-level expectations for activity recognition,” Workshop on Event Mining, Event Detection, and Recognition in Video, Computer Vision and Pattern Recognition, volume 2, page 626, IEEE, 2003; Cutler et al., “Real-time periodic motion detection, analysis and applications,” Conference on Computer and Pattern Recognition, pages 326-331, Fort Collins, USA, 1999; and Moeslund et al., “A survey of computer vision based human motion capture,” Computer Vision and Image Understanding, 81:231-268, 2001.
A common thread in most prior art work is that tracking objects is the first stage of processing. That limits the work to sensor modalities that can provide highly accurate tracking information in the absence of any high-level inference, i.e., video cameras.
The ambiguities inherent in using a motion detector network can introduce enough noise in the tracking results to render most of those approaches unusable. Therefore, there is a need for a method for recognizing activities using a sensor network that overcomes the problems of the prior art.