It is a well known fact of this century that electronic information can spread to many people in a very short time. This fact is good news for some people (spammers, bloggers), but can be rather bad news for those persons responsible for security. The battle against viruses, spam, and other forms of harmful or undesirable, self-propagating information is never-ending.
In the following invention disclosure the problem of the spreading of information has been approached from the direction of network analysis. The present invention includes both methods for helping information to spread more efficiently, and methods for hindering the spreading of unwanted information (e.g., viruses). Much of the background discussion in the present invention disclosure is relevant to either purpose (helping desired information, or hindering unwanted information). In this document we will often use language (‘epidemic’, ‘infection’, etc) which is normally appropriate for the description of the spreading of unwanted information. Our convention is however that this epidemic-oriented language refers implicitly to both desired and undesired information, unless otherwise specified; it is used only for convenience.
There are many kinds of models for epidemic spreading. In perhaps the simplest class of such models, one assigns to each node only one of two possible states: ‘uninfected’ or ‘infected’. If you are uninfected (‘susceptible’), you are deemed liable to be infected by any infected neighbours. Correspondingly, if you are infected, you remain so for the duration of the experiment—and you remain capable of infecting any or all of your neighbours. Of course, on some appropriate time scale, nodes become ‘immune’ to the infection: a human develops antibodies, a machine gets antivirus software, the gossip becomes boring, or the innovation becomes outmoded. We focus on a shorter time scale here, so that we can ignore the state of acquired immunity. The technical name for our model of spreading is ‘SI’, since the nodes have only two states: Susceptible or Infected.
Since spreading takes place over the links of a network, it is clear that the topology of the network can have a profound influence on the spreading process. In particular, we believe that the best understanding of spreading will come from a perspective which is based on a view of the whole network, and on an understanding of that network's structure. In earlier work [1], we have presented an approach to the analysis of network structure which is applicable to any network with symmetric (undirected) links. We also suggested that the analysis should be useful for the understanding of spreading over such a network. Recently [2], we have developed a detailed semi-quantitative theory for how spreading takes place on such networks. The theory is based entirely on our structural analysis. The present invention addresses the question of active design or management of networks for the purpose of controlling (helping or hindering) spreading. Our analysis offers clear suggestions for how to control spreading in both of these senses.
Our approach departs from previous work in that we focus on both the time and spatial progression of the epidemic spreading. We take a spatial resolution which is not microscopic, but rather at the level of ‘neighbourhoods’-connected sub-graphs with roughly the same spreading power. More traditional approaches (reviewed in [4]) start from the ‘well-mixed’ approximation, that every node can infect every other with some probability, at all times. This approach may be said to have no network perspective; or, it may be said to postulate a graph with extremely good mixing-such as a random graph of high degree, or a complete graph. The review of Newman [4] also discusses more recent work, involving a network perspective. All such work is based on whole-graph properties, such as the node degree distribution; also, these approaches have focused on obtaining whole-graph results, either over time [5,6], or focusing especially on the infected fraction at very long times [7]. This latter question is of course only interesting for models more complex than the SI model; and indeed most work is directed towards the behaviour of the SIS model (where nodes lose their infection after some time, and so become Susceptible again), or the SIR model (where nodes, after losing their infection, go through a refractory period). Finally, we note that work analysing only whole-graph properties cannot give the kinds of specific design improvements that are embodied in the present invention.
Brauer [8] has examined the SI model for the case that the nodes (organisms, especially humans) are born and die. Because of the addition of these dynamic features, the steady infection rate is not necessarily 100%. This work uses the well-mixed approximation, which gives rise to coupled ordinary differential equations. Hence it too cannot suggest local, specific design improvements of the type included in the present invention.
A work which is perhaps closest to the present work is that of Wang et al [9]. Their model is SIS, in that nodes can be “cured”; but it is based on a fully microscopic view of the network. In fact, their time evolution operator is the same as that we develop in Ref. [2], with two differences. One is their addition of the “curing” term. This term is simply a multiple of the unit matrix, and so does not change the dominant eigenvector-which remains that of the adjacency matrix A. Because their model is SIS, the long-time infection fraction is not obvious, and must be solved for. The second difference in the time evolution operator of Wang et al is that they neglect the cross terms—i.e. those arising from multiple transmissions to an infected node. This approximation is valid for low infection fraction—while (as we discuss below) it may also be good even as the infection fraction becomes large. Wang et al report simulations which offer some support for this statement.
We emphasize that our work, like that of Wang et al [9], uses the full adjacency matrix A in modelling the time evolution of the infection. Thus we start from a microscopic foundation. However, we will quickly appeal to a ‘mesoscopic’ picture, in which it is meaningful and useful to speak of neighbourhoods and their properties. As far as we know, our work is unique in this regard. This neighbourhood picture is the basis for the methods (for improving the design of networks) which constitute the present invention.