In recent years, log analysis has been widely used to detect information, such as a user's access state to a particular web page. Log analysis provides information to determine a website's state, based on information analysis, such as the number of visitors, the number of page views, cookie values, etc.
Log analysis is classified into (1) a log file inserting method that directly inserts a log file to a web server that manages a particular web page, generates signals, such as an access in a log file, forms data based on the signals, and analyzes the data, (2) a code inserting method that inserts scripts and/or codes into a website, forms data based on the code value, and analyzes the data, and (3) a indirect log analysis method that analyzes external statistical data.
The log file inserting method is disadvantageous in that, when the number of users (who access the server) is increased, the amount of log file data becomes so larger that this work must be entrusted to a specialized analysis organization. The code inserting method is suitable for a website accessed by a relatively small number of visitors, but it is disadvantageous in that the volume of work becomes so large, compared with the log file inserting method, if more than a predetermined number of visitors access such a website.
Since the log analysis is performed in such a way that codes and/or log files must be inserted into a website, it may be used by a website manager to check only the use state of the website he/she is managing. That is, the conventional log analysis can perform only a limited manger-centered analysis.
When an manager wants to strategically determine use states of competitor's websites as well as his/her own website, to detect which websites are popular and which of the websites and web pages users have higher interest, which websites are sponsored by advertisers, and to analyze CRM through users' web surfing cycles, the manager must determine access states to websites other than the manager's own website. However, the conventional log analysis, which provides manager-centered analysis, does not obtain information about the use state of other websites, or information about customer's preferred websites, etc.
AS web marketing has developed rapidly, the conventional log analysis enables a website manager to determine the access state of only the website he/she is managing, but does not allow the manager to establish advanced marketing strategies, compared to the competition. Furthermore, the conventional log analysis does not propose a method for rapidly handling users' requests. Therefore, a new method is required to analyze websites.
In response to this request, for example, a method has been proposed to determine web surfing states and web surfing paths in terms of users, not a website managing manager. That is, the method can extract information about website access based on a particular group of users.
In order to determine whether a specific user accesses a particular website and to determine the user's web surfing path, preceding processes must performed: the structure of a particular web page of the website accessed by the user must be analyzed; and an access signal to match the analyzed web page structure must be generated and all signals must be also processed.
In general, a web page is designed as a single page or as a complex page that uses frameset tags and/or iframe tags.
A single page is a type of web page linked to a single web server using only one URL. That is, a single page has the most general structure and is an html page that does not use tags, such as frameset and iframe. A complex page is a type of web page linked to one web server and/or a plurality of web servers using different URLs. A complex page is a web page using tags, such as frameset and iframe. A complex page contains a main page and subpages. The main page distinguishes page navigation and refers to a page corresponding to a URL in the address bar. The sub-pages are created by frameset and/or iframe tags in the main page.
Internet websites are all composed of a single web page and/or a complex web page. Theses web pages are linked to respective websites so that users can move between the websites. Users can move from one web page to another during the download of web pages. User can also move from one web page to another as one web page is interrupted before being completely downloaded.
In order to analyze a web page structure, the conventional method must first resolve the following problems:
1) Web page structures must be precisely analyzed according to the types of web pages, since web pages are designed by a single web page and/or a complex web page and these web pages are repeatedly moved according to a user's web surfing;
2) Sub-pages in a complex page must be recognized, since the complex page contains subpages and all of the pages can be completely loaded only if the subpages are completely loaded;
3) When web pages are not moved but refreshed, only contents in the pages are changed. Therefore, it must be determined whether the contents are identical to the previous contents when the web pages are refreshed.
4) When only frames in web pages are changed, it is determined whether the frames are arbitrarily selected and changed by a user or whether they are changed according to a periodical operation.
5) Since a single page does not have any additional subpages, a method must be sought to determine whether a single page is refreshed, where the method is a method other than the method to check whether a subpage exists.
Therefore, there is a case where a manager intends to check a movement state of user-centered web pages to determine use states of a variety of websites that are not operated by the manager. In that case, in order to more precisely determine a user's movement state between web pages, a method is required to precisely analyze the web page structure and to determine a variety of movement patterns, such as the downloading all of web documents corresponding to the web pages that a user accessed, refreshing web pages, irregular movement, etc.