1. Technical Field
The present invention relates generally to an improved data processing system and in particular to a method and apparatus for detecting the monitoring of user requests in a network. Still more particularly, the present invention provides a method and apparatus for identifying rewriting of universal resource locators in content requested by a user.
2. Description of Related Art
The Internet, also referred to as an “internetwork”, is a set of computer networks, possibly dissimilar, joined together by means of gateways that handle data transfer and the conversion of messages from the sending network to the protocols used by the receiving network (with packets if necessary). When capitalized, the term “Internet” refers to the collection of networks and gateways that use the TCP/IP suite of protocols.
The Internet has become a cultural fixture as a source of both information and entertainment. Many businesses are creating Internet sites as an integral part of their marketing efforts, informing consumers of the products or services offered by the business or providing other information seeking to engender brand loyalty. Many federal, state, and local government agencies are also employing Internet sites for informational purposes, particularly agencies which must interact with virtually all segments of society such as the Internal Revenue Service and secretaries of state. Providing informational guides and/or searchable databases of online public records may reduce operating costs. Further, the Internet is becoming increasingly popular as a medium for commercial transactions.
Currently, the most commonly employed method of transferring data over the Internet is to employ the World Wide Web environment, also called simply “the Web”. Other Internet resources exist for transferring information, such as File Transfer Protocol (FTP) and Gopher, but have not achieved the popularity of the Web. In the Web environment, servers and clients effect data transaction using the Hypertext Transfer Protocol (HTTP), a known protocol for handling the transfer of various data files (e.g., text, still graphic images, audio, motion video, etc.). The information in various data files is formatted for presentation to a user by a standard page description language, the Hypertext Markup Language (HTML). In addition to basic presentation formatting, HTML allows developers to specify “links” to other Web resources identified by a Uniform Resource Locator (URL). A URL is a special syntax identifier defining a communications path to specific information. Each logical block of information accessible to a client, called a “page” or a “Web page”, is identified by a URL. The URL provides a universal, consistent method for finding and accessing this information, not necessarily for the user, but mostly for the user's Web “browser”. A browser is a program capable of submitting a request for information identified by an identifier, such as, for example, a URL. A user may enter a domain name through a graphical user interface (GUI) for the browser to access a source of content. The domain name is automatically converted to the Internet Protocol (IP) address by a domain name system (DNS), which is a service that translates the symbolic name entered by the user into an IP address by looking up the domain name in a database.
The Internet also is widely used to transfer applications to users using browsers. With respect to commerce on the Web, individual consumers and business use the Web to purchase various goods and services. In offering goods and services, some companies offer goods and services solely on the Web, while others use the Web to extend their reach.
With this wide use of the Internet, businesses have become interested in the behavior of users on the Internet. Information on the behavior of users on the Internet is useful in targeting users for advertising and for businesses trying to identify who visits their Web sites. With respect to tracking user behavior, privacy has become an important issue for many users. The tracking of the behavior of a user is often considered a violation of the user's privacy. One common mechanism used to track browsing habits of a user employs the use of a cookie. A cookie is data created by a Web server that is stored on a user's computer. The cookie provides a way for the Web site to keep track of a user's patterns and preferences and, with the cooperation of the Web browser, to store them on the user's own hard disk. Browsers, however, allow the user an option to refuse cookies or to selectively monitor the acceptance of cookies.
Other mechanisms are present for tracking user behavior other than cookies. One example is the rewriting of URLs by a Web server. In such a case, different users visiting the same site will receive the same pages, but the pages will contain URLs that are dynamically generated for each particular user when that user accesses a particular page. For example, a home page for a Web site, such as www.news.com, may include a hyperlink to a sports site. This hyperlink may be dynamically generated in a manner that can be used to track the behavior of users. When a first user downloads the home page for the URL, www.news.com/index.html, on Jan. 10, 2000, at 3:35 p.m., the home page includes the following URL for the hyperlink to the sports site: www.news.com/sports/user#001month—01—10—00_time—3—35_pm. When a second user downloads this home page at 3:36 p.m. on the same day, the following hyperlink to the sports site is generated for the home page: www.news.com/sports/user#002month—01—10—00_time—3—36_pm. For each user, all of the hyperlinks contain a user field, a date field, and a time field. With this type of hyperlink, it is easy for a Web server to send the same page on sports by interpreting the hyperlink selected by the user and at the same time to track the user. With this information, the time taken to read a Web page also may be identified. One solution for this type of tracking is to employ privacy trust labels generated by sites that review Web sites and certify that Web sites do not track user behavior without permission. Such a system, however, is expensive and prone to fraud. Also, user intervention is needed to determine whether to visit the site.
Therefore, it would be advantageous to have an improved method and apparatus for identifying monitoring or tracking of user behavior.