This invention relates generally to digital data processors and networks of intercommunicating digital data processors capable of automatically generating messages and, in particular, to methods and apparatus for preventing an occurrence of infinite loops of automatically generated messages within and among digital data processors.
Systems that are capable of automatically generating one message in response to another are intrinsically vulnerable to a phenomenon referred to herein as a xe2x80x9cmaelstromxe2x80x9d. A maelstrom can be viewed as a chain reaction in which a single message can unintentionally trigger the generation of a large, rapidly growing, potentially infinite number of messages, quickly swamping and therefore incapacitating the communications network.
Recent years have seen a number of such chain reactions involving message passing in networks. Examples include the following.
(1) A chain reaction occurred in an Ethernet environment, as described by U. Manber, Chain reactions in networks, Computer, pages 57-63, October 1990. The chain reaction was caused by an inconsistency between two different versions of the Berkeley Unix operating system, which had incompatible conventions for specifying broadcast messages.
(2) A xe2x80x98cycled usersxe2x80x99 phenomenon, familiar to email postmasters, occurs when a person having accounts on two or more systems directs each system to forward incoming email to the other. In this simple form it is easily detected, but more complex chains of forwarding involving more than two accounts can escape efficient detection. A variation of this type of forwarding loop can occur when there are delays in processing a user""s instructions to change the final destination from one system to another. The user""s mail may cycle if the systems"" configurations are changed in the wrong order.
(3) Chain reactions resulting from erroneous network administration messages can be caused, for example, when a workstation broadcasts its own hostname on startup. Certain types of configuration error, such as an erroneous hostname, can generate an infinite sequence of low-level error messages.
(4) An Arpanet chain reaction occurred in 1980, in which a recurrent error at one host caused a loop of routing update messages. The entire network was ultimately brought down by the resulting flood of messages, as described by E. C. Rosen, Vulnerabilities of network control protocols: An example, Computer Comm. Review, pages 10-16, July 1981.
(5) Finally, a chain reaction was caused by the Internet xe2x80x9cwormxe2x80x9d of November 1988, as described by E. H. Spafford, The internet worm: Crisis and aftermath. Communications of the ACM, 32:678-688, June 1989.
The examples given above were triggered and/or propagated by a design flaw, hardware failure, or software failure at one or more elements in the network. With the advent of sophisticated message processing systems such as intelligent agents, however, a new variety of chain reaction is likely to occur. This new phenomenon, referred to herein as the maelstrom, is not due to any flaw or failure of any element, but is instead characterized by the collective behavior of many agents, each of which, considered in isolation, is working properly.
Throughout the following discussion, the term xe2x80x9cmessagexe2x80x9d refers to a body of data sent from one entity to another in a network. Messages may be generated by and/or intended for humans (as, for example, email messages); nonhuman agents (e.g., bids in an automated auction); or lower-level processes (e.g., TCP/IP signals). An xe2x80x9cagentxe2x80x9d is an entity in a network capable of receiving messages from, and generating messages for, other such entities. xe2x80x9cForwardingxe2x80x9d refers to the act by an agent of generating one or more messages as a result of receiving a message. In general, the generated message(s) may differ from the received one. A xe2x80x9ctransmission stepxe2x80x9d is the transmission of a single message from one agent to another. A xe2x80x9cmaelstromxe2x80x9d is a self-sustaining chain reaction of forwarding events in which an agent receives messages(s) that were ultimately triggered, through any number of intermediate transmission steps, by message(s) sent by that agent. In a typical maelstrom, the received message triggers a new sequence of forwarding events ultimately causing the agent to receive another message that triggers yet another sequence, and so forth, indefinitely.
As an example of how a maelstrom might naturally occur in a network of email forwarding agents, one may consider the following scenario. A typical computer user (xe2x80x9cFredxe2x80x9d) is one of a small group of friends who exchange jokes with one another via email. Fred decides to automate the distribution of jokes, and instructs his mail agent to forward to his friends any incoming mail with the word xe2x80x9cJokexe2x80x9d in the subject line. This idea then occurs independently to some small fraction other users, and soon jokes are being forwarded several times, from mailing list to mailing list. Eventually there are enough users forwarding jokes that one of the jokes that Fred""s agent had forwarded comes back to Fred. Of course, it is automatically forwarded, and the cycle begins again. As the same joke keeps coming back to Fred, it is again forwarded, endlessly. Every time the joke goes around in this cycle, everyone who originally received the joke receives it again, and forwards it again. Furthermore, because both the original message and each copy are forwarded independently, the number of copies of the message grows rapidly with time. Before long, the network used for e-mail delivery is swamped, and can""t be used to transmit useful information to Fred or anyone else, even those users not involved in the mail loop.
It is important to note that this exemplary maelstrom consists entirely of actions that, as far as any single user knows, are perfectly safe. The maelstrom only occurs when forwarding steps are connected in a loop. In a distributed forwarding network, no single user has access to sufficient information about the network to detect a loop before the message is actually sent. Therefore, in principle, every automatic message forwarding process potentially is the cause of an unforeseen and devastating network breakdown.
The above described scenario is an example of the simplest type of maelstrom. One may further identify subclasses of maelstroms as follows.
1. Simple maelstroms. This type of maelstrom relates to automatic message forwarding in a distributed network, in which each agent forwards incoming messages verbatim to a set of other agents. The set of destinations is different for each sender agent, and may also depend on the header or content of the message, or on other factors such as time of day, etc.
2. Additive maelstroms. This type of maelstrom is a more complex form of automatic message forwarding, in which additional information is added to the message before forwarding. This additional information may be of any nature from the most insignificant, such as a blank line added to the bottom of the original message, to the most significant, such as a complete disavowal of the original message by its original author.
3. Combinatorial maelstroms. This type of maelstrom relates to several messages or parts of messages that are combined to form a single new message prior to forwarding. One example is an automatically generated, personalized newspaper that can be received by an agent and, in turn, used by the receiving agent in whole or in part as content for its own automatically generated newspaper.
4. Maelstroms with finitary transformation. As the message is forwarded from agent to agent, it can be transformed into a succession of variations of which there are a finite (usually small) number of types. Examples of this type of variation include conversion of the message to all capital letters or to all lower case letters, adding or removing a final blank line, or applying a simple character encoding such as one known as xe2x80x9crot13.xe2x80x9d
5. Maelstroms with arbitrary transformation. This type of maelstrom is the most general case, in which agents may transform incoming messages in arbitrary ways before forwarding.
One previous approach to preventing an occurrence of chain reactions in networks was based on inserting identifiers into header fields of messages. One such approach was proposed by U. Manber, Chain reactions in networks, Computer, pages 57-63, October 1990. In the Manber technique the system assigns a unique ID to each newly generated message in the network. This ID is inserted into the header of the message prior to forwarding. At each forwarding step the message, however transformed, retains its original ID. All agents maintain a list of all IDs of messages sent, against which every incoming message is checked. If the ID of an incoming message is found in the list, it is not forwarded. When this technique is strictly adhered to, no message is forwarded twice by any agent and, as a result, no maelstrom can occur.
A second approach that the inventors are aware of was proposed specifically for email messages. In this case, instead of assigning a single, unique ID to each message, each agent inserts its own unique ID into the header of each message that it sends. When it receives a message, the agent searches the header message for its own ID. If the ID is found, then it does not forward the message. This prescription also prevents maelstroms, but may fail to detect multiple copies of a single message that have reached an agent for the first time along distinct paths. The agent is incapable of recognizing that the incoming messages are duplicates. All it recognizes is that none of them contains its ID, and each copy is forwarded independently.
While ID-based approaches such as these can be effective in certain contexts, there are several important cases under which the general concept of deliberately inserting a unique identifier of some sort into a message is either inappropriate or ineffective.
A first case arises if the agent that is performing the forwarding operation is written as an add-on to an existing system. In this case the agent may not have write access to the messages (especially the header area). As such, it is incapable of inserting or manipulating IDs in the header area.
A second case arises if the message is transmitted to other domains that employ protocols other than the one used to encode identifiers. In such situations the message header containing the inserted ID may be lost in the translation. When the message is reinjected into the system that checks for the identifiers, it is treated as a new message, despite the fact that it is not, thereby reinitiating the maelstrom.
A third case arises if the agent modifies the message in a way that is important to some of the other agents in the network, but unimportant to others. In this case, only those agents to whom the modification is important should resend the modified version. The identifier method prevents this from occurring, or at best severely limits it.
Finally, a fourth case where the ID technique may be ineffective is where an agent wishes to ignore some types of modification and pay attention to others when it decides whether to forward a modified version of a message. As in the previous third case, the identifier method prevents or severely limits this from occurring.
These and other scenarios involving maelstroms resulting from automatically generated messages are more likely to plague networks, particularly as intelligent agents become more sophisticated and come into wider use. Given the multiplicity of different networks, protocols, mail systems, etc., that interoperate in modern global communication networks, prior techniques for preventing maelstroms that impose the requirement that a unique message ID be associated with each message or each sender are unlikely to be effective.
The third and fourth cases given above illustrate an important reason why this is so. Most simply, the identifier method effectively prescribes a fixed convention to be applied to all messages modified in any way by any agent. For example, U. Manber""s scheme requires that the forwarded message, however modified, have the same identifier as the original, or at most an identifier that is one of a predefined, strictly limited set of variants of the original identifier. One consequence of this is that it severely limits extensive xe2x80x9cconversationsxe2x80x9d (i.e., series of automatically generated messages passing between two agents) without human intervention. The other technique referred to above, on the other hand, requires that the modified message always be treated as entirely new. This prevents the agents from ignoring trivial changes or exploiting useful ones in a message as it propagates through the network.
Furthermore, the decision of xe2x80x9csamexe2x80x9d or xe2x80x9cdifferentxe2x80x9d, which determines whether the original ID is preserved or a new ID is generated, is made once and for all by the sender agent prior to transmission. This prevents the sort of contextual, individualized, automated decision-making that is one of the central benefits of using intelligent agents in the first place.
It is thus an object and advantage of this invention to provide a general, robust procedure for preventing maelstroms in networks.
It is another object and advantage of this invention to provide an improved procedure for preventing maelstroms in networks that may employ a variety of message protocols, wherein at least some of the message protocols are well-established and unlikely to be modified to explicitly incorporate maelstrom prevention themselves.
It is a further object and advantage of this invention to provide a general, robust procedure for preventing maelstroms in networks in which messages can be augmented, transformed and/or combined.
The present invention is a general, robust method for preventing maelstroms in computers and computer networks populated by entities capable of generating messages in response to other messages.
The method includes equipping entities that automatically generate messages with a message recognition extractor procedure that, for each such message received or sent by the entity, automatically extracts information from the message that will allow a message of identical or similar content to be recognized in the event that it is subsequently received by that entity. The method further includes checking incoming messages against the stored recognition information to identify messages that are likely to be identical to or similar to a message that has previously been received or sent by that entity, and preventing the occurrence of automatic message forwarding if identity or sufficient similarity is detected, either by xe2x80x98silentlyxe2x80x99 preventing message generation and forwarding, or by giving the user or the user""s agent a choice to permit or disallow the message generation.
In one embodiment of the invention, the message recognition extraction procedure extracts one or more signatures from the message, using an automatic signature extraction procedure. The extracted information may also include one or more checksums.
In another embodiment of the invention, messages are filtered prior to automatic signature extraction in such a way that the most common insignificant variations among messages are removed. In this manner the message may be considered to be compressed by the removal of the insignificant variations. Specific types of filtering include, but are not limited to: removing all header data; removing multiple consecutive whitespace characters; removing all non-alphanumeric characters; and/or mapping all characters to upper case or to lower case.
In a further embodiment, messages may be filtered prior to automatic signature extraction in such a way that special data such as inclusions, attachments, or non-textual data is identified and/or treated specially, such as being treated as an indivisible unit.
In a further embodiment, in which the messages are electronic mail (e-mail), the recognition step may include recognition of key phrases that are likely to indicate prior forwarding (e.g. header tags imbedded in the body of the e-mail).
This invention thus pertains to a digital data processing system, to methods executed by a digital data processing system, and pertains as well to a computer program that can be embodied on a computer-readable medium for providing a mechanism to prevent an occurrence of a maelstrom.
This invention provides an information extracting portion or step for extracting information from each message processed by an entity and that may potentially be forwarded to another entity, where the extracted information permits that message or a similar message to be recognized, as well as a storage portion or step for storing the extracted information in a database of extracted information. The database has the extracted information for each message stored in an entry associated with the message. The invention further provides a comparison portion or step for comparing each message received or possibly originated by the entity against the database entries stored in the storage segment and, if an entry is found to be sufficiently similar to the received message, for preventing the received message from triggering the generation and forwarding of a new message, thereby avoiding the creation of a network chain reaction or a maelstrom.
Transforming the message to an invariant form includes attempting to identify at least one of an inclusion, attachment or non-textual data within the message as being a special block of the message. Thereafter, the special block is considered as being an indivisible unit when extracting the information.