The present invention relates generally to a method and apparatus for organizing Internet data in a format meaningful to management and business operation.
With the development in information technology and networking infrastructure, more and more business transactions are being conducted electronically over the Internet. Using the Internet to conduct business transactions is now getting so popular that it is currently well know as electronic commerce (or Internet commerce) by industry and the public. It is fair to predict that electronic commerce is having an enormous impact on the way businesses will be conducted and managed in the future. Thus, there is a great interest in studying and understanding consumers"" behavior and decision making process in the electronic commerce environment.
Traditionally, business transactions have been conducted at business premises, and there exist methods and techniques to study consumers behavior and decision process for a traditional business environment. For example, a retailer can display its goods in store shelves arranged in accordance with the changes of the four seasons. By observing consumers"" reactions to the arrangement, the retailer can adjust the layout of the shelves to facilitate sales of its goods.
In the electronic commerce environment, a retailer or service provider typically displays information about its goods or services on a web site (which includes at least one server) via the Internet. Specifically, the server for the web site can store the information in a set of web page files, such as HTML (Hypertext Markup Language) files. In addition to containing text content, an HTML file may also contain links to other type files, such as graphic or audio files, for displaying pictures and icons and playing audio message. An HTML file may further contain links to other web page files. The other type files can be also stored on the server. By using his/her web browser, a customer (or a potential customer) can remotely navigate through the web site, gaining the information about the goods and services, or ordering selected goods or services. Unfortunately, unlike in traditional business environment, there is no reliable method in the electronic commerce environment at the present time to measure the effectiveness of the layout of a web site. This is due to the difficulties in observing consumers"" behavior and analyzing consumers"" decision process over the Internet.
Historically, the Internet was designed as an open structure in which the main purpose was to exchange information freely without restriction. To obtain a web page file (such as an HTML file) from a web site, a web browser first sends a request to the server for that web site. Upon receiving the request, the server retrieves the HTML file requested and sends it to the web browser. Upon receiving the HTML file, the web browser displays the HTML file as a web page. If the HTML file also contains links to other type files (such as graphic or audio files), the browser subsequently sends requests to the server for these files. Upon receiving the requests, the server retrieves these files and sends them to the web browser. Upon receiving these files, the browser displays pictures and icons on the web page, or executes an application to play audio files embedded in the web page. If the HTML file also contains a link to another HTML file, upon clicking (or activating) the link, the browser sends a further request to the server for the HTML file. Upon receiving the further request, the server retrieves the HTML files and sends it to the web browser. It should be noticed that browsers interact with web sites in a stateless fashion. On the Internet, a particular web site can be accessed by thousands of browsers in a random fashion. While a browser is sending a sequence of requests to a web site, it does not maintain a constant connection to that web site between any two consecutive requests. To a server, it has no control over the sequences of requests; a subsequent request may not have any logical relationship with the previous one; a sequence of requests may come from different web browsers; a request may be generated from a link embedded in an HTML file. Consequently, it is difficult to consecutively observe customers"" activities and behavior in the electronic commerce environment over the Internet.
Current technology provides mechanisms to record access status data (or Internet data) for web page and other type files while a sequence of requests are being received and processed by a server. However, the Internet data are kept as a set of separate and non-correlated data records that are chronologically arranged according to the times at which the requests were received and processed. Consequently, Internet data, without further processing, are not meaningful to management and business operation. In addition, since Internet data are recorded mainly for the purpose of administrating web sites, they may contain redundant and erroneous data that have no use to management and business operation analysis. When Internet data are further processed by other applications (such as by data warehouse applications), these redundant and erroneous data are undesirable because they wastefully occupy storage space and may cause errors in reports or during analysis.
Moreover, Internet data may be generated by different types of servers that may use different formats to record the Internet data. In another words, Internet data generated by different types of servers are not compatible in format. This causes further problem to utilize Internet data.
One difficulty in meaningfully presenting Internet data is how to relate Internet data with individual users. In sending requests to servers, browsers can attach IP (Internet Protocol) addresses together with the requests. Conventionally, IP addresses have been used to identify users. However, one user can use different computers, or two users can use a same computer, to get access to a web site. In either case, an IP address cannot accurately identify a user. Furthermore, in the situation where a user gets access to web sites through an ISP (Internet service provider), IP addresses are dynamically assigned to users when they are connected to the ISP via modem calls. In this situation, different IP addresses may be assigned to a user in different modem call connections.
Another difficulty in meaningfully presenting Internet data is how to organize the data in accordance with transaction events. By way of example, assume that a consumer wants to order a gadget from a web site. In browsing through the web pages in the web site, the consumer may perform the following activities in a purchase event: (1) searching general information about the gadget, (2) searching specific information about a particular type of gadget made by several manufacturers, (3) searching information about prices of the particular type of gadget by the several manufacturers, and (4) ordering a gadget made by a particular manufacture. As described above, the Internet data for recording the activities in the purchase event are kept as a set of separate and non-correlated data records, which may be mingled with other data records.
Therefore, there is a need for a method and apparatus to present Internet data in a format that is meaningful to management and business operation.
There is another need for a method and apparatus to correlate Internet data with users.
There is still another need for a method and apparatus to correlate Internet data with transaction events.
The present invention meets these needs.
The present invention provides a novel method and associated apparatus for processing Internet data.
Currently, a web site is able to store Internet data indicating access status for the files that have been accessed in response to requests from web browsers. Unfortunately, the Internet data are kept as a set of separate and non-correlated data records that are chronologically arranged according to the times at which the requests have been received and processed. Typically, a web page is associated with a web page file, which can further embed other type files. However, the data records indicating access status for a web page file and other type files embedded in the web page file can be scattered among multiple data records. Consequently, the Internet data is not arranged meaningful to management and business operation.
One difficulty in meaningfully present Internet data is to relate Internet data records with individual users, because IP addresses alone are unable to accurately identify users. Another difficulty in meaningfully present Internet data is to relate Internet data records with user sessions during which users perform their transaction events over the Internet.
The present invention presents the Internet data in a format meaningful to management and business operation. In particular, the present invention can correlate data records with individual users. The present invention can also correlate the data records with user sessions during which users performs their transaction events.
In one aspect, the invention provides a method in using with a first set of logs containing data indicating the files that have been accessed and a second set of logs containing data indicating the users that have accessed the files. The method comprises the steps of:
receiving data from the first and second sets of logs;
identifying a plurality of users;
identifying data for files that have been accessed by the users; and
correlating the data for the files with respective users.
In another aspect, the present invention provides a method in using with a first set of logs containing data indicating the files that have been accessed and a second set of logs containing data indicating the users that have accessed the files. The method comprises the steps of:
receiving data from the first and second sets of logs;
identifying a plurality of users;
identifying sessions for the users;
identifying data for files that have been accessed by the users in the sessions; and
correlating the data for the files with respective users and respective sessions.