Driven by the need for more efficiency and agility in business and public transactions, digital data has become increasingly accessible through real-time, global computer networks. These heterogeneous data streams reflect many aspects of the behavior of groups of individuals in a population, including traffic flow, shopping and leisure activities, healthcare, and so forth.
In the context of such behavior, it has become increasingly difficult to automatically detect suspicious activity, since the patterns that expose such activity may exist on many disparate levels. Ideally, combinations of geographical movement of objects, financial flows, communications links, etc. may need to be analyzed simultaneously. Currently this is a very human-intensive operation for an all-source analyst.
Active surveillance of population-level activities includes the detection and classification of spatio-temporal patterns across a large number of real-time data streams. Approaches that analyze data in a central computing facility tend to be overwhelmed with the amount of data that needs to be transferred and processed in a timely fashion. Also, centralized processing raises proprietary and privacy concerns that may make many data sources inaccessible.
In the event of a large-scale bioterrorist attack on a civilian population, for example, triggering the emergency response system even at the first positive diagnosis of a disease caused by a CDC-class A bioterrorist pathogen (e.g., airborne anthrax) may still too late to prevent thousands of deaths, a breakdown of the public health system or civil disorder. Such a disaster can only be prevented when the emerging epidemic is caught while the symptoms of the infected people are still unspecific and very similar to common diseases (e.g., influenza).
New sensor and information technology may be used to detect an attack from the subtle changes in population behavior that usually precede the first medical diagnosis by a significant amount of time. Behavioral patterns in the community are likely to change as people fall ill. This change is reflected in many different population activity indicators (e.g., school absenteeism, traffic patterns) that are increasingly accessible in real-time. A system that surveys multiple data points in real-time may be more successful in triggering an alert than any single data source.
The detection and classification of subtle changes in the population activity requires the integration of a wide variety of non-specific real-time data sources into the operation of the surveillance system. The providers of the data are often very sensitive to proprietary and privacy concerns. For instance, local sales figures of various over-the-counter remedies at individual pharmacies are an invaluable contribution to a biosurveillance system, but the owner of the data (the merchant) must be assured that this data does not reach its competitors. Also important is data from the public healthcare system, such as the number of patients inquiring about certain symptoms at their physician. But the surveillance system is only permitted to work with anonymized data.
The use of non-specific data sources for the early detection of an epidemic in a population requires the integration of many population activity indicators to achieve the required sensitivity and specificity. Furthermore, to guarantee the early detection of an outbreak, the system must operate on real-time data that is updated at least several times a day. As a result, there is an immense amount of data that needs to be processed in a timely fashion.
A biosurveillance system must be robust against cyber attacks and component failures, inexpensive and unobtrusive in its day-to-day operation. Such a system should also be intuitive in its reporting, and designed for low-cost adaptivity and scalability along various dimensions, including the spread and complexity of population patterns, types and locations of data sources, and detected symptoms and diseases, or detected attack patterns.
The need remains, therefore for a new generation of active surveillance systems to integrate a large number of spatially distributed heterogeneous data streams. Such a capability may be used in various applications, for instance, to protect a civilian population from bioterrorist attacks, to support real-time traffic coordination systems, to trace collaboration structures in terrorist networks, or to manage public healthcare efficiently.