1. Field of the Invention
The present invention relates to a technique for improving a portion which involves a problem in point of configuration in a hypertext system configuration on a network.
2. Description of the Prior Art
In a hypertext system (e.g., World Wide Web, hereinafter referred to simply as xe2x80x9cWebxe2x80x9d) configured on a network, an access history of each user (visitor) can be recorded in a server which stores a hypertext. In the access history are usually included an identifier (IP address in case of utilizing the internet) of a computer used by the accessed user, accessed time, and an identifier (URL in Web) on the server of the accessed page (file).
As a technique for analyzing both access history and hyperlink structure and acquiring a knowledge which permits judging whether the configuration of a hypertext system (e.g., Web site) is superior or inferior, there is known [Perkowitz and Etzioni, 98] (Perkowitz and Etzioni, Adaptive Web Sites: Automatically Syntherizing Web Pages, in Proc. of AAAI-98).
According to the technique of [Perkowitz and Etzioni, 98], first, with respect to all of page sets on a site, an access co-occurrence frequency from the same user is calculated and page sets exceeding a predetermined threshold value are allowed to remain, while the other page sets are discarded. Further, from among the remaining page sets, those actually coupled by hyperlinks are discarded. Then, the still remaining page sets are regarded as a graph coupled with arcs. The graph is then analyzed to extract a clique (a complete graph with all nodes coupled through arcs). A page group which constitutes the clique is not connected with hyperlinks correlated (strongly) with each other, it is possible to understand that an inferior portion on the site could be found out. According to this technique, despite a strong tendency to making access at the same session by many users, it is possible to find out a page group free of hyperlink therebetween (therefore each user will be asked to make much effort for page-to-page transition).
According to the prior art it is possible to show explicitly a page group inferior in configuration, but it has been impossible to analyze the cause of a problem involved in the page group. Therefore, it has been required for the hypertext administrator concerned to find out a page group remedying method by trial and error. This method involves modifying a certain portion, collecting access histories for a while, measuring the effect of improvement, and repeating these operations. During these operations, the configuration of the hypertext system becomes unstable, giving rise to the problem that users making repeated access are confused. Further, since the method for improving a page group in a hypertext system depends on the object, scale, function, and layout of the hypertext system, as well as topics and user type, it is necessary to grasp features of the hypertext system concerned.
The present invention has been accomplished in view of the abovementioned circumstances and provides a hypertext analyzing method involving pre-calculating a correlation between various attributes extracted from page contents and a page-to-page transition frequency with respect to an arbitrary page set on a hypertext system concerned and showing which attribute should be modified and how it should be modified for improving the configuration with respect to a portion which includes configurational problem on the hypertext system.
For implementing the above method, according to the first aspect of the present invention there is provided a hypertext analyzing system including a hyperlink transition frequency acquiring unit for analyzing access history information on the access to a hypertext system, also analyzing a hyperlink structure, and calculating a hyperlink transition frequency between pages (e.g., all pages) linked together by a hyperlink, an attribute extracting unit for extracting (one or more) attributes from the contents of a page set linked by a hyperlink, a correlation analyzing unit for calculating a correlation between the hyperlink transition frequency and the attributes, a correlation data storing unit for storing data obtained by the correlation analyzing unit, an attribute analyzing unit for comparing attributes which have been extracted, using the attribute extracting unit, from the contents (e.g., one) designated page set with the correlation data stored in the correlation data storing unit and thereby acquiring information on which attribute should be modified, to what degree it should be modified, and what effect will be obtained thereby, on the assumption that the hyperlink transition frequency between the pages of the page set is to be changed (generally for the purpose of increase), and a display unit for displaying the result obtained by the attribute analyzing unit.
In this configuration, it is possible to easily show which attribute should be changed for obtaining a desired transition frequency between pages which have been correlated with each other using a hyperlink in designer""s expectation of user transition. An administrator of a hyperlink system, e.g., Web system, can build and maintain a desired Web system by changing attributes of Web pages on the basis of the contents thus presented.
According to the second aspect of the present invention there is provided a hypertext analyzing system including a hyperlink transition frequency acquiring unit for analyzing access history information on the access to a hypertext system, also analyzing a hyperlink structure, and calculating a hyperlink transition frequency between pages (e.g., all pages) linked together by a hyperlink, an attribute extracting unit for extracting (one or more) attributes from the contents of a page set linked by a hyperlink, a correlation analyzing unit for calculating a correlation between the hyperlink transition frequency and the attributes, a correlation data storing unit for storing data obtained by the correlation analyzing unit, a to-be-analyzed page set acquiring unit for calculating, using the hyperlink transition frequency acquiring unit, a hyperlink transition frequency between pages (e.g. all pages) linked together by a hyperlink in a designated page group and acquiring a page set (e.g., one or more) of a small hyperlink transition frequency, an attribute analyzing unit for comparing attributes which have been extracted, using the attribute extracting unit, from the contents of the page set acquired by the to-be-analyzed page set acquiring unit with the correlation data stored in the correlation data storing unit and thereby acquiring information on which attribute should be modified, to what degree it should be modified, and what effect will be obtained thereby, on the assumption that the hyperlink transition frequency between the pages of the page set is to be changed (generally for the purpose of increase), and a display unit for displaying the result obtained by the attribute analyzing unit.
Also in this configuration, it is possible to easily show what attribute should be changed for obtaining a desired transition frequency between pages which have been correlated with each other, and it is possible to build and maintain a desired hyperlink system, e.g., Web system. In addition, pages small in transition frequency despite of being linked by a hyperlink can be picked out automatically as an object to be modified.
According to the third aspect of the present invention there is provided a hypertext analyzing system including a hyperlink transition frequency acquiring unit for analyzing access history information on the access to a hypertext system, also analyzing a hyperlink structure, and calculating a hyperlink transition frequency between pages (e.g., all pages) linked together by a hyperlink, attribute extracting unit for extracting (one or more) attributes from the contents of a page set linked by a hyperlink, a correlation analyzing unit for calculating a correlation between the hyperlink transition frequency and the attributes, a correlation data storing unit for storing data obtained by the correlation analyzing unit, a to-be-analyzed page set acquiring unit for calculating, using the hyperlink transition frequency acquiring unit, a hyperlink transition frequency between pages (e.g., all pages) linked together by a hyperlink in a designated page group, further calculating a contents similarity between the pages with use of the attribute extracting unit, and on the basis of a ratio between the hyperlink transition frequency and the contents similarity, acquiring a page set (e.g., one or more) which is small in the hyperlink transition frequency despite of being similar in contents, an attribute analyzing unit for comparing attributes which have been extracted, using the attribute extracting unit, from the contents of the page set acquired by the to-be-analyzed page set acquiring unit with the correlation data stored in the correlation data storing unit and thereby acquiring information on which attribute should be modified, to what degree it should be modified, and what effect will be obtained thereby, on the assumption that the hyperlink transition frequency between the pages of the page set is to be changed (generally for the purpose of increase), and a display unit for displaying the result obtained by the attribute analyzing unit.
In this configuration, it is possible to easily show which attribute should be changed for obtaining a desired transition frequency between pages which have been correlated with each other using a hyperlink, and it is possible to build and maintain a desired hyperlink system, e.g., Web system. In addition, pages linked by a hyperlink and small in transition frequency despite of having associated contents can be picked out automatically.
According to the fourth aspect of the present invention there is provided a hypertext analyzing system including an access similarity analyzing unit for analyzing access history information on each of (e.g., all) page sets which constitute a hypertext system and thereby calculating a page-to-page access similarity which represents the degree of access made to both pages of the page set concerned by (many) users, an attribute extracting unit for extracting (one or more) attributes from the contents of the page set or from a hypertext which contains the page set, a correlation analyzing unit for calculating a correlation between the page-to-page access similarity and the attributes, a correlation data storing unit for storing data obtained by the correlation analyzing unit, an attribute analyzing unit for comparing attributes which have been extracted, using the attribute extracting unit, from the contents of a designated (e.g., one) page set or from a hypertext structure containing the page set with the correlation data stored in the correlation data storing unit and thereby acquiring information on which attribute should be modified, to what degree it should be modified, and what effect will be obtained thereby, on the assumption that the page-to-page access similarity in the page set is to be changed (generally for the purpose of increase), and a display unit for displaying the result obtained by the attribute analyzing unit.
In this configuration, it is possible to easily show which attribute should be changed for obtaining a desired transition frequency between pages not linked by a link, and it is possible to build and maintain a desired hyperlink system, e.g., Web system.
According to the fifth aspect of the present invention there is provided a hypertext analyzing system including an access similarity analyzing unit for analyzing access history information on each of (e.g., all) page sets which constitute a hypertext system and thereby calculating a page-to-page access similarity which represents the degree of access made to both pages of the page set concerned by (many) users, an attribute extracting unit for extracting (one or more) attributes from the contents of the page set or from a hypertext which contains the page set, a correlation analyzing unit for calculating a correlation between the page-to-page access similarity and the attributes, a correlation data storing unit for storing data obtained by the correlation analyzing unit, a to-be-analyzed page set acquiring unit for calculating, using the access similarity analyzing unit, a page-to-page access similarity between arbitrary pages in a designated page group and acquiring a page set (one or more) small in the page-to-page access similarity, an attribute analyzing unit for comparing attributes which have been extracted, using the attribute extracting unit, from the contents of the page set acquired by the to-be-analyzed page set acquiring unit or from a hypertext structure which contains the page set with the correlation data stored in the correlation data storing unit and thereby acquiring information on which attribute should be modified, to what degree it should be modified, and what effect will be obtained thereby, on the assumption that the page-to-page access similarity between the pages of the page set is to be changed (generally for the purpose of increase), and a display unit for displaying the result obtained by the attribute analyzing unit.
Also in this configuration it is possible to easily show which attribute should be changed for obtaining a desired transition frequency between pages which are not directly linked by a link, and it is possible to build and maintain a desired hyperlink system, e.g., Web system. Further, pages with little co-occurrence of access by users can be picked out automatically as an object of modification.
According to the sixth aspect of the present invention there is provided a hypertext analyzing system including an access similarity analyzing unit for analyzing access history information on each of (e.g., all) page sets which constitute a hypertext system and thereby calculating a page-to-page access similarity which represents the degree of access made to both pages of the page set concerned by (many) users, an attribute extracting unit for extracting (one or more) attributes from the contents of the page set or from a hypertext which contains the page set, a correlation analyzing unit for calculating a correlation between the page-to-page access similarity and the attributes, a correlation data storing unit for storing data obtained by the correlation analyzing unit, a to-be-analyzed page set acquiring unit for calculating, using the access similarity analyzing unit, a page-to-page access similarity between arbitrary pages in a designated page group, further calculating a contents similarity between the pages with use of the attribute extracting unit, and on the basis of a ratio between the page-to-page access similarity and the contents similarity, acquiring a page set (one or more) which is small in the page-to-page access similarity despite of being similar in contents, an attribute analyzing unit for comparing attributes which have been extracted, using the attribute extracting unit, from the contents of the page set acquired to the to-be-analyzed page set acquiring unit or from a hypertext structure which contains the page set with the correlation data stored in the correlation data storing unit and thereby acquiring information on which attribute should be modified, to what degree it should be modified, and what effect will be obtained thereby, on the assumption that the page-to-page access similarity in the page set is to be changed (generally for the purpose of increase), and a display unit for displaying the result obtained by the attribute analyzing unit.
Also in this configuration it is possible to show which attribute should be changed for obtaining a desired transition frequency between pages which are not directly linked by a link, and it is possible to build and maintain a desired hyperlink system, e.g., Web system. Further, pages with little co-occurrence of user access despite of associated contents can be picked out automatically as an object of modification.
In the first to sixth aspects of the present invention, the attribute extracting unit may extract, as one of attributes, at least the position of a hyperlink in page contents, at least the number of hyperlinks in page contents, at least the type of a hyperlink in page contents, at least the size of a hyperlink in page contents, at least the type of a character which represents a hyperlink in page contents, or at least the size of a page in page contents.
In the first, second, fourth, or fifth aspect of the present invention, the attribute extracting unit may extract, as one of attributes, at least the contents similarity between pages of a page set.
In the first to sixth aspects of the present invention, the attribute extracting unit may extract, as one of attributes, at least the position of a page or at least data and time of page update.
In the fourth to sixth aspects of the present invention, the attribute extracting unit may extract the number of hyperlink transitions between pages as one of attributes.
In the first to third aspects of the present invention, the hyperlink transition frequency analyzing unit may identify a search robot which makes access to hypertexts in a comprehensive manner and collects information automatically, and make analysis exclusive of access information provided from the search robot.
In the fourth to sixth aspects of the present invention, the access similarity analyzing unit may identify a search robot which makes access to hypertexts in a comprehensive manner and collects information automatically, and make analysis exclusive of access information provided from the search robot.
In the first to third aspects of the present invention, the hyperlink transition frequency analyzing unit may identify a proxy server and make analysis exclusive of access information provided from the proxy server.
In the fourth to sixth aspects of the present invention, the access similarity analyzing unit may identify a proxy server and make analysis exclusive of access information provided from the proxy server.
In the first to third aspects of the present invention, the hyperlink transition frequency analyzing unit may make analysis exclusive of a hyperlink transition with a page large in linked number as a link destination.
According to the seventh aspect of the present invention there is provided a hypertext analyzing system including a hyperlink transition frequency acquiring unit for analyzing access history information on the access to a hypertext system, also analyzing a hyperlink structure, and calculating a hyperlink transition frequency between pages (e.g., all pages) linked by a hyperlink, an attribute extracting unit for extracting (one or more) attributes from the contents of a page set linked by a hyperlink a correlation analyzing unit for calculating a correlation between the hyperlink transition frequency and the attributes, a correlation data storing unit for storing data obtained by the correlation analyzing unit, an attribute analyzing unit for comparing attributes which have been extracted, using the attribute extracting unit, from the contents of a designated (e.g., one) page set with the correlation data stored in the correlation data storing unit and thereby acquiring information on which attribute should be modified, to what degree it should be modified, and what effect will be obtained thereby, on the assumption that the page-to-page hyperlink transition frequency is to be changed (generally for the purpose of increase), a display unit for displaying the result obtained by the attribute analyzing unit, an editing unit for modifying the contents of the designated page set while making reference to the displayed result, and a contents modification effect analyzing unit for comparing attributes which have been extracted, using the attribute extracting unit, from the contents modified by the editing unit with the correlation data stored in the correlation data storing unit, thereby predicting the page-to-page hyperlink transition frequency between the pages of the page set, and calculating a modification effect, the result obtained by the contents modification effect analyzing unit being displayed in the display unit.
In this configuration it is possible to easily show which attribute should be changed for obtaining a desired transition frequency between pages associated with each other using a hyperlink. In addition, it is possible to make the adjustment of attribute surely on the basis of a predictive result after the attribute adjustment and hence possible to build and maintain a desired hyperlink system, e.g., Web system.
According to the eighth aspect of the present invention there is provided a hypertext analyzing system including an access similarity analyzing unit for analyzing access history information on each of (e.g., all) page sets which constitute a hypertext system and thereby calculating a page-to-page access similarity which represents the degree of access made to both pages of the page set concerned by (many) users, an attribute extracting unit for extracting (one or more) attributes from the contents of the page set or from a hypertext which contains the page set, a correlation analyzing unit for calculating a correlation between the page-to-page access similarity and the attributes, a correlation data storing unit for storing data obtained by the correlation analyzing unit, an attribute analyzing unit for comparing attributes which have been extracted, using the attribute extracting unit, from the contents of a designated (e.g., one) page set or from a hypertext structure containing the page set with the correlation data stored in the correlation data storing unit and thereby acquiring information on which attribute should be modified, to what degree it should be modified, and what effect will be obtained thereby, on the assumption that the page-to-page access similarity in the page set is to be changed (generally for the purpose of increase), a display unit for displaying the result obtained by the attribute analyzing unit, an editing unit for modifying the contents of the designated page set while making reference to the displayed result, and a contents modification effect analyzing unit for comparing attributes which have been extracted, using the attribute extracting unit, from the contents modified in the editing unit with the correlation data stored in the correlation data storing unit, thereby predicting the page-to-page access similarity, and calculating a modification effect, the result obtained by the contents modification effect analyzing unit being displayed in the display unit.
In this configuration it is possible to easily show which attribute should be changed for obtaining a desired transition frequency between pages not directly linked by a link. In addition, it is possible to make the adjustment of attribute surely on the basis of a predictive result after the attribute adjustment and hence possible to build and maintain a desired hyperlink system, e.g., Web system.
In the first to sixth aspects of the present invention, the correlation analyzing unit may have a function of selecting attributes not considered to be effective at the time of calculating a correlation between the hyperlink transition frequency and attributes (or between the access similarity and attributes) and such attributes may be ignored in the processing which follows.
In the first to sixth aspects of the present invention, there may be used a to-be-analyzed object designating unit for designating (e.g., via a network) a hypertext system to be analyzed.
In the first to sixth aspects of the present invention, there may be used a to-be-analyzed object designating unit whereby plural users designate (e.g., via a network) a hypertext system to be analyzed.
In the first to sixth aspects of the present invention, there may be used a contents transmitting unit for delivery and receipt (e.g., via a network) of the contents of a hypertext system to be analyzed.
In the first to sixth aspects of the present invention, there may be used a contents transmitting unit for delivery and receipt (e.g., via a network) of the contents of a hypertext system to be analyzed to and from plural users.
In the first to sixth aspects of the present invention, there may be used an access history information transmitting unit for delivery and receipt (e.g., via a network) of access history information of a hypertext system to be analyzed.
In the first to sixth aspects of the present invention, there may be used an access history information transmitting unit for delivery and receipt (e.g., via a network) of access history information of a hypertext system to be analyzed to and from plural users.
In the first to sixth aspects of the present invention, there may be used an attribute designating unit for designating (e.g., via a network) a tuple of attributes extracted by the attribute extracting unit.
In the first to sixth aspects of the present invention, there may be used an attribute designating unit whereby plural users designate (e.g., via a network) a tuple of attributes extracted by the attribute extracting unit.
In the seventh aspect of the present invention, the editing unit may be configured so that it can be operated via a network or can be operated by plural users via a network.
In the first to sixth aspects of the present invention, the display unit may be configured so that it can transmit display contents via a network or can transmit display contents to plural users via a network.
The present invention can also be realized in the form of a hypertext analyzing method. In this case, the method is constructed with a procedure corresponding to each of the unit which constitute the hypertext analyzing system. At least a part of the configuration of the present invention may be implemented as a computer software. The present invention can also be implemented by a recording medium with such computer software recorded therein.