1. Field of the Invention
The present invention relates to computers and computer networks. More particularly, the invention relates to profiling an Internet endpoint associated with an Internet Protocol (IP) address.
2. Background of the Related Art
Profiling what users are doing on the Internet at a global scale, e.g., which applications and protocols users use, which sites the users access, and who the users try to talk to, are intriguing and important questions for a number of reasons. For example, the profiling results can reveal regional characteristics of cultural and behavioral patterns, important user usage pattern trends, potential exploitation of security vulnerabilities, early indication of user acceptance of a new product or service, etc. The profiling results can be used for various purposes such as strategic development, product/service marketing, network traffic engineering, security enhancement, etc.
The most common way to answer the above questions is to analyze network traces. However, the access issues to network traces at a global scale and the processing power required for analyzing network traces in large volume result in the inapplicability of state-of-art packet-level traffic classification tools for this scenario.
The Internet is composed of machines (e.g., computers or other devices with Internet access) associated with IP addresses for identifying and communicating with each other on the Internet. The Internet and the IP addresses are well known to those skilled in the art. These machines are called endpoints on the Internet. Internet endpoints may act as a server, a client, or a peer in the communication activity on the Internet. In vast majority of scenarios, information about servers such as the IP address are publicly available for user to access. In peer-to-peer (p2p) based communication, in which all endpoints can act both as clients or servers, the association between an end point and the p2p application becomes publicly visible. Even in the classical client-server communication scenario, information about clients such as website user access logs, forums, proxy logs, etc. also stay publicly available. Given that many forms of communication and various endpoint behaviors do get captured and archived, enormous amount of information valuable for profiling or characterizing endpoint behavior at a global scale is publicly available but has not been systematically utilized for such purpose.