Network-based software and services (including websites, electronic communications, software-as-a-service (SaaS) offerings, and others) rely on an increasingly large and complex set of dependencies to operate. A failure or breach of any of these dependencies can cause service disruptions, outages, and other negative outcomes for the services that depend on them (directly or indirectly), resulting in loss of business continuity or other financial harm to the organizations that operate them.
There are many possible kinds of dependencies. One major category is service providers. These include (but are not limited to) hosting providers, domain name systems (DNS), content delivery networks (CDN), cloud infrastructure, managed Web servers, email services, payment processors, certificate authorities, and analytics and monitoring.
A second category includes components used to build and operate products and services. These include (but are not limited to) operating systems, application servers, code libraries, databases, networking systems, and hardware. A systematic fault in one of these components can simultaneously affect large numbers of services that use the component. For example, a bug in the Linux kernel related to leap seconds caused widespread disruption in 2012.
A third—and less obvious—category consists of software defects and malicious software. Although these are not intentional dependencies, they, too, can pose significant aggregate risk. For example, a wide-scale ransomware attack has the potential to disrupt large numbers of software services and businesses.
Dependency relationships are often not immediately apparent. For example, if Web site A is hosted on hosting provider B, and hosting provider B uses a domain-name service (DNS) provider C, a failure of C can lead to a failure of A, even though no direct business or technical relationship—only a transitive one—exists between A and C.
Furthermore, large numbers of services (and businesses) may rely on a single dependency (direct or transitive). A failure of that dependency can thus cause surprisingly widespread disruptions. These dependencies therefore create aggregate risk (also known as correlated risk) from the point of view of a business operating multiple services or service instances, or from the point of view of an organization with a financial interest in a portfolio of businesses (e.g., insurance or investments).
Many methods of reliability and risk analysis assume that failures are uncorrelated and independent, because this greatly simplifies the analysis. However, because of the above points, this assumption often leads to inaccuracies and/or understated risks in networked environments. There is a current need for methods of identifying dependencies and other risk factors which pose high levels of aggregate risk, and of quantifying this risk.