1. Field of the Invention
The present invention relates to a network analysis technique, and more specifically to an information processing system, an information processing method and a program for classifying the users, in accordance with attributes of users who access a network to send information thereto.
2. Description of Related Art
Recently, information communication over a network such as the Internet or a wide area network (WAN) has become popular with increase in processing capabilities of computers, connecting devices and the like. The information communication over a network typically includes: computers (hereinafter referred to as “nodes”) connected to the network to function as Web clients; and a Web server device (hereinafter referred to as “server”) accepting and processing access requests from multiple nodes. The server uses a server application written in common gateway interface (CGI), servlet or the like to provide a service such as a mail send-and-receive service, a file send-and-receive service, a search service, or a blog posting based on a social network service (SNS) or chat service.
As more kinds of information are sent and received and more kinds of services are provided over a network, network users who access the network have ranged over broad types. For example, there are users who access a server to send non-malicious mail messages, users who only search and download information managed by a server, and users who send information by posting a blog entry and posting bona fide messages on blogs managed by other users. Hereinafter, user nodes managed by the above bona fide users will be referred to as “normal nodes.”
There are also users sending enormous unnecessary information without permission, and users posting malicious messages on chats or blogs. Hereinafter, user nodes managed by such users accessing a network out of malice will be referred to as “spammer nodes.”
In some serious cases, the above activities taken by spammer nodes might critically interfere with network activities by, for example, bringing down services provided by bona fide normal nodes other than spammer nodes, or forcing blogs to shut down.
In addition, although sending piles of undesired e-mails over a network will not bring down any services or force any blogs to shut down to critically damage network activities, it will waste network bandwidth, adversely affect network activities of normal nodes, and lead to the spread of computer viruses. With these circumstances, there is a demand for detecting and dealing with such spammer nodes on a network to prevent the spammer nodes from adversely affecting network activities of normal nodes. Furthermore, spammer groups of multiple spammer nodes may work together to camouflage themselves as non-spammer nodes.
Various attempts have already been made to detect spammer nodes. For example, as a method for addressing spam mail messages, a method of creating an answer set of allowable mail messages to determine spam mail messages or the like and causing a device to learn the answer set by machine learning is already known. In addition, a method is also known in which spam levels of certain nodes are determined on the basis of spam reports from multiple nodes, and in which an administrator checks the spam levels to execute access controls on some nodes.
Examples of spam prevention techniques to which the above policies are applied are Japanese Patent Application Publication No. 2003-115925, Japanese Patent Application Publication No. 2004-118541, Japanese Patent Application Publication No. 2004-362559, Japanese Patent Application Publication No. 2006-178998 and Japanese Patent Application Publication No. 2003-348162. In Japanese Patent Application Publication No. 2003-115925 and Japanese Patent Application Publication No. 2004-118541, the number of spam mail transmissions is detected, and users having frequently transmitted spam mail messages are determined as spammer nodes. In Japanese Patent Application Publication No. 2004-362559 and Japanese Patent Application Publication No. 2006-178998, details of messages are analyzed so that features characterizing spammer nodes can be obtained, and an answer set is generated from the features. Then, sources of messages with these features are determined as spammer nodes.
Japanese Patent Application Publication No. 2003-348162 discloses a technique whereby, if a user terminal receives an unwanted e-mail, the user informs a network of information on the e-mail as unwanted e-mail information. In response, the unwanted e-mail is stored among the received e-mails received by a mail receiving server, and the unwanted e-mail information from the user terminal is registered into a database. Then, if the received e-mails include any e-mail identical or closely similar to e-mails indicated by the unwanted mail information registered in the database, the mail receiving server is programmed not to deliver the e-mail.
As described above, the conventional techniques each are either a technique of previously preparing an answer set for allowing contents forwarded over a network to distinguish between spammer nodes and normal nodes, or a technique of generating an answer set by receiving reports from nodes. The techniques based on such answer set generation are effectively applicable to the case where spammer nodes send data with content, such as mail messages. However, application of the above techniques to this case has problems that such learning method might not improve spammer nodes extraction efficiency of a server so much considering the amount of overhead of the server caused by the learning procedure.
Moreover, consider the case where spammer nodes send data with no content or less-than-required amount of content, that is, for example, the case where spammer nodes take actions with unpredictable details, such as posting comments. In this case, it is difficult to obtain features characterizing spammer nodes in advance, and, as a result, it may not be possible to efficiently learn which nodes are spammer nodes interfering with blog posting or chatting.
In addition, techniques based on reports from nodes are also known to have problems. For example, it is known that a spammer node takes a so-called retaliatory action when the spammer node detects that a normal user reports, as a spam mail, a mail send by the spammer node. Specifically, as a retaliatory action, the spammer node reports, to a security center, the reporter (normal node) having reported the spammer node, as a spammer. Such retaliatory actions not only make it more difficult to uniquely determine spammer nodes, but also cause normal nodes to avoid spammer reporting in many cases. Moreover, when spammer nodes form a spammer group, the spammer nodes become able to take adverse actions (such as increasing their evaluations) in addition to retaliatory reporting. Thus, such a technique also has a problem of allowing members of a spammer group to work together to decrease their spam levels. Therefore, there has been a demand for a technique of distinguishing spammer nodes from the other non-spammer nodes among multiple nodes connected to a network without preparing any answer set in advance. In addition, there has also been a demand for a general technique for classifying each of user attributes of nodes accessing an application server as either a normal node or a spammer node by using an access log to the application server, regardless of direct reports from nodes.