The normal functioning of a community, such as an industrialized country, depends in large part upon the quality of essential services provided by critical infrastructures, such as services that provide power, telecommunications, transportation, and water distribution, for example.
These critical infrastructures exhibit new vulnerabilities, namely due to their increasing complexity, their high degree of automated control, and their interdependence. The connections between infrastructures may be physical means of spreading events that occurred in one infrastructure to another infrastructure. If a failure occurs in one infrastructure, this may lead to harmful cascading effects on other infrastructures, and create other, more dangerous failures in certain infrastructures. Due the heavily interconnected nature of these infrastructures, disturbances, failures, or even the destruction of a part of infrastructure may cause unacceptable, or even fatal, incidents.
In such cases, an insufficient understanding of the interdependence between critical infrastructures increases reaction time and the restoration of services, and sometimes makes it impossible to identify vulnerabilities and anticipate risks. In particular, coordination problems and errors often occur in crisis situations, due to the need to make decisions quickly under stressful conditions, with a multitude of unreliable and incoherent information. A misunderstanding and insufficient mastery of an infrastructure, combined with human errors, may considerably influence decision-making in crisis situations.
Currently, traditional solutions for handling a crisis situation in an infrastructure mainly consist of laying down recommendations without any true formalism, drafting a Business Continuity Plan (or BCP) and contracts regarding Service Level Agreements (or SLAs), using basic techniques for modeling and simulating interdependence between infrastructures, and using conventional communication means, such as the telephone, the fax machine, and electronic mail.
The business continuity plan defines a set for procedures and actions for an infrastructure, meant to restore acceptable functioning of a faulty essential service. The business continuity plan BCP requires that its content be maintained, deployed, and approved on a regular basis, and its operations require numerous resources. Furthermore, the business continuity plan BCP is not always up-to-date on the latest threats.
A Service Level Agreement (SLA) is a legal instrument for setting a minimum quality of service that an infrastructure delivering the service must meet. Service Level Management (SLM) tools have no role other than to control the quality of services delivered by the infrastructure. A Service Level Agreement SLA and Service Level Management SLM tools have the drawback that there is a lack of transparency and effective communication between the interdependent infrastructures. An infrastructure that is dependent on another infrastructure does not have the capability or relevant information to effectively anticipate safeguards when a service failure occurs in said other infrastructure. The infrastructure only receives insufficient reports based on metrics set forth in the Service Level Agreement SLA and generated by the Service Level Management SLM tools.
Consequently, the business continuity plan BCP and the Service Level Agreement SLA only provide static responses that are ineffective in critical situations that require a high degree of reactivity.
Other solutions consist of modeling the interdependence between critical infrastructures and drafting simulation techniques in order to analyze the impact of interruptions in services delivered by the infrastructures, and potentially to anticipate interruptions. These solutions have the drawback of being very complex, due to the intrinsic complexity of each infrastructure and the presence of multiple physical and logical connections between the infrastructures, which makes it very difficult to predict the infrastructures' behavior. Additionally, these models and simulations are mostly based on qualitative information, because a quantitative approach requires recovering data regarding the infrastructure. However, this data is not always accessible. If they are sensitive or confidential, the models will therefore insufficiently represent the infrastructures' true vulnerability.
Consequently, there is no centralized system for supervising critical infrastructures, i.e. interdependent critical infrastructures which have critical services that depend on the functioning of one or more services provided by one or more other infrastructures. There is no system with the capabilities to continuously manage this interdependence in real time and to facilitate crisis management in the event of failure in an infrastructure.